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The EL2 Electret Transmitter: Analytical 
Modeling, Optimization, and Design 


By J. C. BAUMHAUER, Jr. and A. M. BRZEZINSKI 
(Manuscript received February 21, 1979) 


We describe here the development of an EL2 Electret Transmitter 
that provides desirable attributes for use in hands-free-answer tele- 
phony and future electronic and special-purpose sets. The EL2 has 
lower sensitivity to spurious electromagnetic and mechanical signals 
than do existing magnetic transmitters. It offers lower dc power 
consumption, smaller size, and lower intrinsic noise and distortion 
than the carbon microphone. Formulation of an electro-mechano- 
acoustic electret model allows parameter optimization, in which side 
conditions on electrostatic stability and a prescribed transmit fre- 
quency response are adhered to. We show that higher sensitivities 
are possible with larger air film thickness, until the decreasing source 
capacitance becomes a limiting factor. Multiple diaphragm supports 
allow decreased film stress with no change in the stable electret 
charge or sensitivity. We describe theoretically a thermal stabiliza- 
tion procedure that minimizes long-range stress relaxation effects by 
accelerating viscoelastic changes. Based on film data, we project 
nominal sensitivity variations within +1 dB over 20 years of service. 
In the design, metallized poly(tetrafluoroethylene) electret film is 
tensioned, supported, and clamped above a selectively metallized 
stationary electrode forming three cells acoustically and electrically 
in parallel. A preamplifier completes the subassembly, which is 
housed in a rectangular aluminum enclosure shielding the trans- 
ducer. Typical EL2 parameters are —32 dBV/N/m’ sensitivity at 
1 kHz, 1 kQ output impedance, 3.2 kHz response resonance frequency, 
and 2 to 16 V required dc supply. 


1557 


1. INTRODUCTION 


Electromagnetic transducers based on the modulation of a biasing 
magnetic field in an air film appeared as microphones (often referred 
to as transmitters in telephony), receivers, and loudspeakers very early 
in the development of telephony.’” Today, this mechanism is employed 
in general-purpose U- and L-type telephone receivers in the Bell 
System. A smaller electromagnetic transducer, the AF1, was used as 
the electroacoustic transmitter in speakerphone modules until mid- 
' 1978. However, because of its operating principle, in recent years it 
was recognized to be inherently susceptible to an increasing incidence 
of spurious electromagnetic signals at customer locations. At the same 
time, interest was growing in a new telephone transmitter for electronic 
and special-purpose residential sets as well as for hands-free-answer 
features in business sets. The former residential applications generally 
require lower dc power consumption and smaller size than are typical 
of the variable-resistance granular carbon transmitter® such as the T- 
type used in general-purpose sets. Moreover, the carbon transmitter, 
being dependent upon periodic mechanical agitation, is not suitable as 
the stationary transmitter in hands-free-answer applications. These 
events encouraged Bell Laboratories development efforts’ and the 
subsequent design of the EL2 Electret Transmitter. Compared to the 
AF1, the electret promised a greatly reduced sensitivity to electromag- 
netic as well as mechanical vibration interference. Its signal power 
efficiency is surpassed only by the carbon transmitter. However, com- 
pared to the latter, the electret could be used in applications where 
lower bias power consumption, smaller size, lower intrinsic noise and 
distortion, and the potential for longer service life were required. 7 

A variable-capacitance transducer is based on the modulation of a 
biasing electrostatic field in an air film. In an electret, that field is 
provided by an “electret charge” distribution” typically implanted in, 
and near the surface of, a thin, solid, dielectric film abutting the air 
_ film. The externally biased condenser microphone was first demon- 
strated by A. E. Dolbear® in 1878, although E. C. Wente’ was the first 
to develop a practical instrument in 1917 at the AT&T Research 
Laboratories. In 1928, S. Nishikawa and D. Nukiyama used a thick, 
wax plate electret element in building an early electret transducer.® In 
the United States, R. T. Rutherford was granted an electret micro- 
phone patent in 1935.° While the Japanese used wax electrets during 
World War II, their extremely low capacitance and unstable electret 
charge retention remained a problem.® In 1962, Sessler and West 
developed the electret-biased, polymer-film transducer®”® at Bell Lab- 
oratories. Made with charged thin Teflont films, these microphones 


+ Registered trademark of E. I. DuPont de Nemours. 
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overcame the earlier limitations of self-biased units. Through the 
1960s, large investments in the development and design of polymer- 
film electret microphones were made, largely outside telephony. Bell 
Canada introduced their electret transmitter design into an operator 
headset product in 1970” and later into a speakerphone. Recent efforts 
at Bell Laboratories culminated in initial Western Electric production 
shipments of the Bell System EL2 electret transmitter’? in May 1977. 
It has now replaced the AF1 in the 4A Speakerphone and is being used 
as the stationary transmitter in a number of business sets for hands- 
free-answer Services. 

This paper covers the transducer device aspects of the EL2 devel- 
opment and design, while a companion article’ will treat new tech- 
nological aspects of the project. Here, analytical modeling and param- 
eter optimization are first presented (Section II). Those results, cou- 
pled with physical, electrical, material, and telephone constraints are 
then shown applied in the physical design (Section III) to achieve 
performance and reliability objectives (Section IV). 


li. ANALYTICAL MODELING AND OPTIMIZATION} 
2.1 Transducer ph ysics and model 


Capacitance microphones are based on the modulation of a biasing 
de electric field in a gas dielectric; the field in turn modulates the 
induced surface charge on an adjacent electrode. In the electret micro- 
phone (see Fig. 1), the biasing field in the air film is the result of a 
fixed “electret” surface charge’* which may be considered trapped on 
an imaginary electrode at the surface of the polymer film adjacent the 
air film. Free charge is thus induced in the “stationary” electrode. 
Assuming a homogeneous isotropic linear solid dielectric and a one- 
dimensional electric field normal to the stationary electrode, an appli- 
cation of (z) the charge equation of electrostatics, (ii) Gauss’s equation 
on the line integral of electric field, and (iii) jump conditions on the 
continuity of electric displacement’? may be shown” to yield the 
electric field in the air film and the induced free surface charge on the 
stationary electrode as, respectively, 


Ei = (Vg + od/e)/(h' + deo/e), 
ob = €E}. (1) 


Here, ¢« and ¢, are the dielectric permittivity of the polymer and air 
films, respectively, — o is the negative electret surface charge per unit 
area, d and A’ are the polymer and air film thickness, respectively, and 
an externally applied dc voltage Vr across the stationary and moving 


+ See appendix for partial list of symbols. 
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Fig. 1—Multicell electret transmitter: 1. Sound port and impedance material; 2. Front 
acoustic chamber; 3. “Electret-charged” polymer film diaphragm; 4. Diaphragm metal- 
lized electrode; 5. Air film; 6. Stationary electrode with holes; 7. Rear acoustic chamber; 
8. Preamplifier. 


diaphragm electrodes has been assumed presently, in addition to the 
electret self-bias. Meter-kilogram-second units have been employed. 
The electrostatic traction, T, exerted on the polymer film diaphragm 
is given by the normal surface component of the Maxwell electrostatic 
stress tensor’’ in the air film, ice., 


T =e, (E})*/2. (2) 


Since T, together with the polymer film membrane tension, will deter- 
mine the biased diaphragm equilibrium displacement, it is clear from 
eqs. (2) and (la) that electrostatically an electret bias charge o of 
magnitude such that 


od/e = Ve (3) 


is equivalent to an external voltage Vz present in a nonelectret capac- 
itance (condenser) microphone. This equivalence, previously shown by 
Warren et al.’® by employing energy considerations, can likewise be 
shown to hold for dynamic operation as first discovered by Sessler." 
Henceforth, only a passive electret self-bias will be assumed present; 
that is, we take Ve = 0 in eq. (la). Of course, all results can easily be 
applied to capacitance microphones using the equivalence (3). It is 
noted that quantities representing the biasing electrostatic equilibrium 
(or intermediate) state are being designated with the superscript “1.” 

In Fig. 1, a multicell electret diaphragm supported by ribs, spaced at 
distance 2a and each of length 28, is shown in its equilibrium position. 
Identical individual cells operating acoustically and electrically in 
parallel are formed. A membrane diaphragm’s effective “lumped pa- 
rameter” mass, stiffness, and area (all per unit cell) may be written, 
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respectively, 
M. = pda’, K. = yS, A. = Ba’. (4) 


Here, p is the film mass per unit volume and S is the applied membrane 
force per unit edge, hereafter referred to as tension. For a rectangular 
diaphragm used in a capacitive transduction device, the constants may 
be shown to be appropriately based upon the average membrane 
displacement and are given by”® 


xk = 7°é/16A, y= 2" K/4, B = 4&/A, (5) 


where € is the number of cells formed and A = £2a/2b is the prescribed 
overall diaphragm aspect ratio; the prescribed overall diaphragm area 
is then €A,. Relations (5) were derived assuming one-dimensional 
tension in the horizontal direction in Fig. 1, and with zero displacement 
at all boundaries surrounding each unit cell. Referring to the figure, 
the electrostatic displacement, w, of the effective rigid piston model is 
defined by 


hi'=h-vw. (6) 


The input sound port and acoustic material act as a damped mass, 
Mr, of incompressible air, having effective area Ar and damping 
coefficient Ry, which add a degree of freedom to the system. When an 
acoustic signal p(t) is impressed upon this mass, coupling to the 
diaphragm is effected through a front acoustic chamber stiffness Kr. 
Diaphragm motion is likewise influenced by the rear acoustic chamber 
stiffness Kr, thin air film damping” Rr, its own mechanical mass and 
stiffness, and air mass loading. Due to the inherently low capacitance 
of the combined cells, a preamplifier of high input impedance, 


Zp = 1/(1/Rp + jwCp), (7) 


and near-unity gain A is housed in the microphone rear acoustic 
chamber. Hereafter, a bold-face quantity denotes that it is to be 
considered complex. 


2.2 Equations of motion and stability 


One may show that the electrostatic potential energy stored in the 
dielectric/electrode system in Fig. 1 may be written in terms of o, 
ob, d, and h’. Energy and external work expressions for all mechanical, 
electrical, and acoustic elements may likewise be written. With these . 
expressions, one can treat the coupled electrostatic-dynamic system 
by letting 


oh > ob + Galt), h'—> h' — u(d), (8) 


6p being the “small” time-varying charge and u the small dynamic 
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diaphragm displacement. An application of Lagrange’s equation yields 
the equations of motion’® 


PA, Zx 0 Zr|[U 
0 —-X Ze 0 Y 


where, assuming small dynamic fields superposed on a “large’’ electro- 
static bias, we have used u(t) « h!, 63(t) < ob. Above, 


x = (od/e)/jw(h' + dp), bk = €0/€, p=, Zk = —Kr/jw 
Zr = (Rr + joMr)/r” — Zx, T= Ar/Ae, Ze = Zp + 1/jwC"’, 
C' = ©, Aev/(h' + dp), Zu = jwM. + Rr+ (Ke + Kr + Kr)/jw, 


Krr = pac’ Azé/Urr, M.e— Me + (%)palur + ur) /é. (10) 


In eqs. (9), P, U, 1 = osAv, and Y€A-, are the steady-state complex 
amplitudes of the input sound pressure, piston diaphragm velocity, cell 
current, and sound port volume velocity, respectively, and x is the 
electromechanical coupling coefficient. The factor v is defined as that 
fraction of A. metallized on the stationary electrode, w is the circular 
frequency, Ur and vp are the front and rear acoustic chamber volumes 
(ur includes the volume of the stationary electrode holes), and pa, c 
are the density of, and sound wave velocity in, air. Relation (10k) on 
M. allows for the air mass loading on the diaphragm by the “gas 
spring” acoustic chambers. In addition, Lagrange’s equations, due to 
equilibrium of the bias state, yield the electrostatic stability criterion 


7 —- L/(1— 7)’ = 0, (11) 
where, for a stable and physically realizable solution, 
L = (od/e)*e,a*Bv/2yS(h + du)® S 4/27, 
n = w/(h + dp) S 1/8, (12) 


a well-known result.”° 
The open-circuit “cell sensitivity” and associated cell source imped- 
ance are defined, respectively, as 


V = Limit (Z,1)/P, Zs = (VP)/I|z,_,. (13) 

Zp>® 
In the absence of small motional impedance terms evident near reso- 
nance, Z;' can be shown to reduce to jwC', where C’ is the cell 


capacitance in the electrostatic equilibrium state; see eq (10h). Of 
course, (18b) will be employed in calculations. The electret “transmit- 
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ter sensitivity” is given by 


Vp = ViZp/(Zp + Zs/§) JA, (14) 


where é accounts for the fact that the cells are in parallel, and A was 
defined following eq. (7). Before proceeding, it is noted from Griffin’s 
work’® that the thin air film damping Rp is proportional to 1/(h')°, and 
to the cube of the spacing between rows of stationary electrode holes 
in each cell as shown in Figs. 1 and 5. This degree of freedom over the 
magnitude of Re will prove most helpful. 


2.3 Optimization and EL2 design 


In telephony, it is desirable that the transmit frequency response 
rises and peaks at the upper end of the telephone bandwidth, that is, 
between 3 and 4 kHz. This specification on frequency and level, which 
places a side condition on microphone design, may be approximated 
by allowing Mr and Rr, 1.e., the sound port impedance, to vanish 
presently in the expression for V. Then, finding the maxima of | V| 
with frequency in terms of the well-known dynamic magnification 
factor,’ .“(w), of a mass-spring-damper system can be shown to yield 


w2/w2 = (1 — Mi)'? = 1—- RR/2M2w?, (15) 


where .@y is the maximum of .# occurring at the damped resonance 
frequency waz, and the natural frequency is given by 


wn = (Ke + Kr)/Me. (16) 


Since “@ and wa will be prescribed, (15a) yields a side condition on 
Wn. It is also seen from eq. (15b) that the desired .7y may be achieved 
by realizing the proper Rp as previously indicated. While the analytical 
optimization will be described more thoroughly in Ref. 16, most results 
are provided here. First, with stability criterion (12a) and side-condi- 
tion (16), it may be shown that maximum | V | is achieved for w; = K./ 
M., that is, ur = ©. A design at K. = Kr will be 3 dB below maximum 
|V|, but will require lower film tension. This result is unlike that 
reported by Fraim et al.” where the frequency side condition was 
treated but not as an integral part of the optimization. Second, it may 
be shown that | V| « (A + du)'” owing to a higher electrostatically 
stable electret charge level with increased h [see eqs. (11) and (12)] as 
seen in Fig. 2. However, since increasing h lowers the source capaci- 
tance, eq. (14) can yield a transmitter sensitivity which reaches a 
maximum with h, as seen. It may be shown from side condition (16) 
that 


Sa 1/€. (17) 


Then, in the stiffness-controlled region it follows that | V | is independ- 
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Wd =3980Hz'27 


M yy= 2.24 
d =29 um 
Vp =8.5X 10-7 m3 
wlug?<<1 
R,. = 
z poe 
= 
>2 
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> —_ 
EE 
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Fig. 2—One-day cell sensitivity, transmitter sensitivity, and corresponding stable 
electret charge vs air film thickness for a rectangular design. For all h, S = 60.1 N/m. No 
front chamber is present. 


ent of € and thus cell size. Similarly, |Z,| and |V,| may be shown 
independent of € allowing relation (17) to be freely employed in keeping 
film stress levels below strength limitations by increasing £. 

The design approach is now to use condition (16) to determine S for 
a given geometry. Then constraint (12a) determines the stable charge 
level possible. Such data were obtained for the EL2 for w?/wy «<1 
while iterating parameters such as d, h and vp. In applying (12a), the 
upper bound on J, i.e., 4/27, was reduced to {(4/27) to allow margin 
for (i) worst-case tolerances expected, (iz) anticipated aging effects, 
and (iiz) a safety factor related to extreme service conditions, modeling 
assumptions, etc. Item (iz) is discussed later. The following values, 
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descriptive of the EL2, were used to generate Fig. 2: 


wa = 20(3980 Hz),t Mu =2.24,4 tA =7.3X 107° m’, 
A = 1.74, é= 3, vp = 0.73, d = 29 um, 
Ur = 8.5 X 107? m’, 
€=0.114,  p = (2.23 x 10° + 38.6 x 10°4/d) kg/m’, 
€e= 2.08€., A= 0.9, CG; = Cin Bc Cairay os (5 + 6) pF. (18) 


The second term in eq. (187) on p is added to the EL2 polymer film 
density to account for the metallized electrode mass (film and metal- 
lization are described in Section III), and €, pa, and c are well known. 
For all A in Fig. 2, condition (16) yielded tension S = 60.1 N/m. Curves 
such as those in Fig. 2 together with additional physical, material, and 
electrical constraints allowed optimum final parameters such as A, d, 
and ur to be chosen. With the parameters in eq. (18) and choosing 
h = 36.1 pm, Fig. 2 gives o = 13 x 10°° C/m’ (ad/e = 204 volts) for the 
EL2. 


2.4 Thermal stabilization and EL2 performance 


Given the above EL2 design parameters, we find that h' = 35.2 wm 
and Rr = 0.032 Ns/m which, together with cell geometry, yield’ the 
required spacing between rows of holes in the stationary electrode. 
Then the transmitter performance including its sound port acoustic 
impedance” influence is seen in Fig. 3, where we have used for the EL2 


R, = 10° ohms, ve =0.7X 1077 m?, = Ar = 1.72 x 10°9/E m’, 
Rr= 04 pab/é + 2.53 X 107 *°pa(wo)'/”/E Ns/m, 
Mr = 2.21 X 107°pa/E kg, (19) 


being the kinematic viscosity of air. The calculated response is seen 
to match that typically measured quite favorably across the telephone 
bandwidth. The response, influenced by input acoustics, peaks at 
about 3080 Hz and | V,| at 1 kHz and éC’ are —31.8 dBre 1 V/N/m’ 
and 9.6 pF, respectively. Above the frequency of peak response (asso- 
ciated with the first diaphragm resonance), the results begin to differ 
due to the single degree-of-freedom model used for the continuous 
diaphragm. 

To this point, we have ignored tension and electret charge variation 
with time; accordingly, all previous results are referred to 1 day past 
manufacture, as represented in Figs. 2 and 3. However, the viscoelastic 
polymer film undergoes stress relaxation and/or creep under load. 
Similarly, extensive investigation has been conducted on electret 
charge retention at Bell Laboratories and will be reviewed in a com- 


+ Prescribed in the absence of sound port impedance. 
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panion article.'* Here, we simply give a curve fit for the assumed EL2 
electret charge retention at room conditions normalized to the 1-day 
level, of 1]: 


o(t)/o[1] = ¥ Me™”, (600 s St S 20 yrs), 


0.954 9.66 x 101! 
0.039 1.08 x 10° 
To 0.001 5 2.37 X 10’ ‘ 
: 0.004 }’ : 3.81 x 10° | * 
0.003 2.50 x 10° 
0.005 1.03 x 104 (20) 


Data indicate the initial electret charge to be about 1 percent above 
o[1], which will be employed here. Since o/o[1] is only down to 0.97 at 
20 years, it is not surprising that the diaphragm tension relaxation will 
be of far greater impact. Recently, Wang and Matsuoka™ studied the 
viscoelastic behavior of metallized annealed EL2 film. Through time- 
temperature superposition, they obtained a reduced relaxation modu- 
lus vs reduced time over a 500-year period at room temperature (using 
18,000-s high temperature isotherms-ambient to 75°C). Using a gen- 
eralized Maxwell model,” their result for the room temperature relax- 
ation modulus is here expressed numerically by 


G°(t) = y Ge”, (600 s St S 500 yrs), 


30.11 1.14 x 10" 
3.61 1.59 x 10° 
. 4.04 ; P . 1.88 x 108 
PEE ggg) TON, HOT Gag sc toe | 
4.97 2.38 x 10° 
7.03 6.48 x 10° (21) 


where superscript “o” denotes room temperature. To their Arrhenius 
plot of the temperature shift factor a7, the Arrhenius equation, 


In ar = 25,530[1/(T + 273) — 1/(T° + 273)], (22) 


has here been fit, where T' is temperature in °C, and T° = 22.7 °C. 

In EL2 manufacture, the diaphragm undergoes a fixed stress a” for 
a short duration, t = 0 to ¢, during which time the material creeps to 
strain €™. At time ¢, the film diaphragm is clamped permanently at 
strain €™”. Stress relaxation in accordance with the decreasing modulus 
then ensues. While «” could be chosen such the proper membrane 
tension, S, and hence frequency response previously prescribed at 1 
day were obtained, unsatisfactory increases in sensitivity and decreases 
in resonance frequency would occur during, say, a 20-year service life. 
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Fig. 3—Typical EL2 frequency response one day after assembly (following thermal 
stabilization), at room temperature. 


For this reason,” a “thermal stabilization” procedure has been devel- 
oped whereby the electret devices are placed at high temperature T* 
for duration At = t) — t; where t < t; < tz S 86,400 s (1 day). While the 
mechanics of the resulting thermoviscoelastic problem are only 
sketched below, the reader may refer to Ref. 16 for more details. For 
a thermorheologically simple viscoelastic material undergoing “small” 
strain, the following linear constitutive equation holds”’ 


ae™(0’) 
ag’ 





8 
o™(t) = 6™(0) = { G°(0 — 6’) dé’, 


0 
A(t) “| 1/ar dt, (23) 
0 


where tis the real, and @ the reduced, time. For ¢ > ¢, it may be shown 
that the creep between ¢ = 0 and ¢ may be viewed as a step in strain 
of magnitude €™” at ¢ = 0. From the convolution integral (23a), it may 
be shown using the Dirac-delta function and Laplace transforms that 


o™(t) = (6) = EMF (8), t>> t. (24) 


Now, for t << t S th, 0 = t and eq. (24) yields a well-known result.” 
During stabilization, eq. (23b) gives 6 = ¢; + (t — t:)/ar(T*), which is 
the basis for time-temperature superposition.” For the service life, 
t = to, : 


0=t + (t — t2) + At/a7(T*). (25) 


At 1 day, the stress o” = S/d is prescribed as can be seen from eqs. 
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(15), (16), and (4b)—see the argument following eq. (16). Thus, € may 
be determined from eqs. (24) and (25). In a manner similar to that 
used in obtaining eq. (24), it can be shown that the actual step change 
in o” at t = 0 yields 


e“(t) = o*J°(t), 0<tSt. (26) 


With the creep compliance J°(t) known, we can then find from eq. 
(26) the required o™” that must be applied to the film for duration ¢ to 
achieve the desired 1-day tension following stabilization. Microphone 
performance for all ¢ >> ¢ (i.e., during the service life) is thus independ- 
ent of the choice of a (¢, «”) pair satisfying eq. (26). However, since the 
creep compliance increases more rapidly with time initially, manufac- 
turing tolerances on ¢, to maintain a given (acceptable) variance on 
o™ for all ¢ >> ¢, are less restrictive for larger ¢. Desiring time-inde- 
pendent transmitter performance through the service life, the following 
parameters were chosen for the EL2: T* = 60°C, At = (2/3 day), and 
t2 = (1 day), yielding €” = 0.65 percent. Then, for for f = 60 s, we use” 
J°(é) = 10.6 x 10-* m?/N and obtain the EL2 initial stress o” = 4.2 x 
10° N/m’? or tension, 122 N/m (more than twice the 1-day level), to be 
applied. Note that ™” could be lowered by increasing ¢. It is interesting 
that at 1 day, 6 = (29 years), that is, the equivalent of 29 years of room 
temperature relaxation has been effected. Now, with time variations 
in both eqs. (20) and (24) present, quantities like h', Rr, and K. change 
accordingly, and the results in Fig. 4 are obtained; EL2 performance 
during the service life has been “stabilized.” The 1-kHz transmitter 
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_ Fig. 4—Predicted EL2 sensitivity at 1 kHz and frequency of peak response level vs 
time at room temperature, stabilized at 60°C for 16 hours. 
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sensitivity is seen to drop very slightly during the first year due to 
minor but dominant charge decay assumed, after which time slight 
tension relaxation causes | V,| to rise and the resonance frequency to 
decline slightly. 


Ill. PHYSICAL DESIGN 
3.1 Constraints and objectives 


The physical design of the EL2 electret transmitter was guided by 
the following factors: potential applications, reliability, and manufac- 
turability. The primary application for the new electret microphone 
has been the replacement of the AFI in the 680-type transmitter 
module used with the 4A Speakerphone. This application dictated 
physical constraints based on the existing module configuration. The 
size of the AFl was an upper bound on that of the EL2. Electrical 
compatibility with other expected applications was also a major influ- 
ence on the impedance and dc power requirements for the microphone. 
The low cell capacitance results in a combined cell source impedance 
(Z;/&) of 16 MQ at 1 kHz. This high impedance necessitates the use of 
an internal preamp intimately associated with the cells to reduce 
loading of the output signal (V), and to minimize external electromag- 
netic field pickup. Potential use in line-powered applications required 
the preamplifier to be operational down to a supply voltage of 2 volts. 
Performance objectives were also stipulated to insure satisfactory 
operation in all anticipated applications. Electroacoustic conversion 
efficiency, frequency response, and mechanical and electromagnetic 
interference sensitivity were some of the operational characteristics 
specified. 

The optimized parameters generated with the analytical model [see 
eqs. (18) and (19) for most physical design parameters] provided a 
framework for the physical design. The ultimate configuration reflects 
model results that were in turn influenced by physical, material, and 
telephone constraints. A major effort was made to insure simplicity 
and ease of assembly while stressing environmental reliability and long 
life. Twenty years service within an operating temperature range of 
—23°C to +49°C was the overall design objective. 


3.2 Description 
3.2.1 Subassembly 


The heart of the microphone is the subassembly or “cartridge,” 
shown in the exploded view in Fig. 5. The diaphragm is positioned and 
tensioned across the face of the backplate between the shim and 
clamping plate. The spring clip clamps these parts against the top of 
the backplate while holding the preamplifier in position on the under- 
side. The backplate is the nucleus of this structure with the other parts 
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Fig. 5—-EL2 subassembly. 


assembled to it. An electroplatable grade of acrylonitrile butadiene 
styrene (ABS) is used to mold the backplate to facilitate selective 
metallization of the three microphone cell areas located on its upper 
surface. These conductive areas, which are electrically common, serve 
as the stationary electrode of the microphone configuration. A small 
diaphragm support rib 36 ym high (A) surrounds each metallized cell 
area. These supports are unmetallized to reduce stray capacitance. 
The diaphragm resting on these ribs creates the air film (h’) between 
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the flexible diaphragm and the stationary electrode. The sensitivity of 
the microphone output to changes in this air film necessitates the use 
of precision molding techniques to maintain dimensional control of rib 
height to within +5 ym (+0.0002 in). The holes through the backplate 
couple the air film under the diaphragm to the rear acoustic chamber 
and are spaced at a distance to effect proper air film damping in 
accordance with the model in Section II. They are located in rows at 
the outer edges of the cells adjacent to the ribs. This maximizes the 
amount of continuous electrode area (vA.) in the center region where 
the greatest diaphragm displacement (u) occurs. 

The preamplifier is a conventional source follower configuration 
with a JFET transistor and two thick film resistors on a ceramic 
substrate providing an effective impedance transformation and a volt- 
age gain of —1 dBV. The input impedance (Z,) is approximately 100 M 
ohms resistance in parallel with about a 5-pF junction, and 6-pF stray 
capacitance. The output impedance is nominally 1 kilohm. A contact 
spring soldered to one end of the substrate provides electrical conti- 
nuity from the stationary electrode on the backplate to the input of 
the amplifer circuit, while mechanically clipping the preamp to the 
backplate. Three leads extending from the opposite end of the sub- 
strate provide for electrical bias, output, and common connections to 
the assembled microphone. Two small chip capacitors bridge these 
leads on the substrate to provide RFI suppression. 

The electret diaphragm material is a 29-ym thick, stress-relieved, 
cast, poly(tetrafluoroethylene) (PTFE) film. It is metallized on one side 
with a titanium flash approximately 100 A thick under 2000 A of 
evaporated gold.’ The metallized side of the film functions as the 
movable electrode in the microphone. This particular Teflon film was 
selected for its superior charge retention characteristics. It is electro- 
statically charged to approximately 13 x 10°° C/m’ to provide the 
electric field (E3) in the microphone assembly air film. The charge 
(—o) on the polymer film eliminates the need for the external dc bias 
required in conventional condenser microphones. The diaphragm is 
tensioned longitudinally under fixed load S = 122 N/m for duration ¢ 
= 60 sec minimum prior to clamping it in the relative position indi- 
cated, and with the metallized side opposite the air film. The nominal 
value of ¢ chosen is influenced by two primary considerations. The 
duration of ¢ should be maximized to minimize its tolerance stringency 
[see the discussion following eq. (26) ], whereas it should be minimized 
for the sake of manufacturing expediency. 

A thin Mylar} shim is sandwiched between the backplate and 
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clamping plate along with the diaphragm film. It provides a relatively 
compliant surface against which to clamp the film. “Hard” clamping 
the diaphragm without a shim would result in nonuniform clamping 
pressure, causing nonuniform tension across the width of the film. The 
tab on the end of the shim provides electrical insulation between the 
preamp contact spring and the enclosure in the final assembly. 

The clamping plate is a metallized precision molded piece part which 
is indexed on the backplate and clamps the pretensioned diaphragm in 
position across the cell area. Both the clamping plate and backplate 
are molded from the same grade of ABS material so as to have a similar 
thermal expansion coefficient. This coefficient also approximates that 
of the electret film. The clamping plate is metallized to provide 
electrical continuity from the metallized side of the electret film to the 
spring clip, which is connected directly to the common terminal on the 
preamp. 

The phosphor bronze spring clip maintains a compressive spring 
force of approximately 20 N on the assembled parts to keep the 
diaphragm properly tensioned. This force is sufficiently large to pre- 
vent slippage of the diaphragm in the cartridge assembly, but is limited 
to avoid overstressing and causing cold flow of the plastic parts. The 
clip also provides electromagnetic shielding by surrounding most of 
the subassembly and being electrically connected to the preamp com- 
mon terminal. After clamping with the spring clip, the subassembly is 
a working microphone configuration. The spring clip design permits 
easy disassembly to facilitate repair or reassembly if required prior to 
final assembly. 

The completed cartridge assembly is thermally stabilized, which 
involves “soaking” the units at an elevated temperature of 60°C for a 
period of 16 hours to accelerate the diaphragm stress relaxation. This 
preaging treatment results in a more uniform and stable product (see 
Fig. 4). Preliminary testing is done after stabilization to evaluate 
microphone performance prior to final assembly. 


3.2.2 Final assembly 


Final assembly involves ferruling the pretested subassembly into a 
rectangular aluminum enclosure and back cover. A stainless steel 
woven wire acoustic screen is included in front of the subassembly and 
a gasket across the back, as shown in Fig. 6. The screen, in conjunction 
with the input sound port, provides acoustic impedance elements Rr 
and Mr which modify the frequency response. The wire screen also 
serves as a dirt shield across the input sound port. The gasket provides 
electrical insulation between the preamp and back cover and functions 
as a shock-absorbing element in the microphone assembly. The exter- 
nal aluminum enclosure is electrically connected to the common ter- 
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Fig. 6—EL2 final assembly. 


minal on the preamp, thereby providing additional electromagnetic 
shielding. Acoustic leaks around the leads are sealed, and the complete 
unit is tested according to the final requirements. A cross section of 
the completed EL2 microphone is illustrated in Fig. 7 to provide an 
overall perspective of the assembly, which is 18 by 12 by 7.6 mm in 
depth. 


IV. MICROPHONE PERFORMANCE 
4.1 Performance characteristics 


The EL2 operates on a bias supply of 2 to 16 V while drawing about 
150 »A. Figure 3 shows a typical measured frequency response of the 
telephone transducer compared to the calculated result from the 
model. The response is relatively flat in the lower frequency stiffness- 
controlled region and rises about 7 dB to the resonance peak at 3200 . 
Hz, above which the level falls off rapidly. The typical electroacoustic 
sensitivity at 1 kHz is —32.0 dBV/N/m”. Model results show that the 
output signal level is altered by (uncompensated) changes in a number 
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Fig. 7—EL2 Transmitter Unit. 


of the basic microphone parameters. For example, the variation of 
output voltage versus charge for this microphone configuration is given 
by 0.7 dBV/10-°C/m?. This relationship was verified both analytically 
and empirically. EL2 sensitivity will also vary with changes in the air 
film thickness in accordance with the gradient —1.4 dBV/5yum, which 
substantiates the need for precision molding of the diaphragm support 
ribs. The microphone output variation with applied tension is given by 
—0.5 dBV/8N/m. The resonance frequency also varies with tension 
according to the ratio of 100 Hz/8N/m. It must be pointed out that 
these coefficients, which are linear approximations, apply only for 
minor variations about the nominals specified for the basic parameters. 

The vibration sensitivity of the microphone is < —40 dB re 1 V/g at 
1000 Hz. The EL2 “signal voltage to spurious mechanical noise voltage” 
ratio is about 20 dB greater than that of the AFl microphone. This 
improved vibration isolation is acquired due to the relatively low mass 
of the electret diaphragm. The RFI sensitivity is < —75 dB re 
1 V ina 10 V/m field from 0 to 300 MHz. The hum or low-frequency 
electromagnetic pickup sensitivity is < —95 dB re 1 V/gauss at 60 Hz. 
The hum interference rejection of the electret is 35 dB better than 
that of the AF1. In general, the electret microphone successfully met 
all the performance objectives initially outlined and performs well as 
a replacement for the AF1 in the 4A Speakerphone. 


4.2 Reliability 


The ruggedness and reliability of the EL2 design have been estab- 
lished by tests’ which included mechanical and thermal shock, vibra- 
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tion, and combinations of temperature and humidity. The small 
changes which occurred during the course of the study tests had 
minimal effect on overall performance. The output was found to vary 
slightly as a function of temperature at a rate of 0.045 dB/°C.” This 
is a reversible effect, resulting in no permanent change in output upon 
return to ambient temperature. The charge retention of the EL2 
electret film, found to be very stable, is discussed in detail in a 
companion article.'? Moreover, earlier studies over 800 days of testing 
compared sensitivity variations (aging) of numerous commercial and 
Bell Labs’ prototype electret microphones to transducers employing 
other mechanisms including granular carbon, electromagnetic, and 
piezoelectric. The aging conditions ranged between 25°C/40% R.H. 
and 65°C /90% R.H. The electret devices performed comparably to the 
other types of microphones in these tests. 


V. SUMMARY 


An EL2 electret transmitter has been developed which provides 
lower sensitivity to spurious electromagnetic and mechanical signals 
than the AF1 magnetic transmitter it has replaced in the 4A Speaker- 
phone. It offers lower dc power consumption, smaller size, and lower 
intrinsic noise and distortion than the carbon transmitter. These 
attributes make it a strong candidate to replace the carbon transmitter 
in future electronic and special-purpose sets. 

An electro-mechano-acoustic model of an electret transmitter gov- 
erning its coupled electrostatic/dynamic operation has allowed EL2 
parameters such as air film thickness, number of cells, and acoustic 
chamber volume to be collectively optimized. It is shown that, within 
bounds, sensitivity can actually be raised by designing with increased 
air film thicknesses. The analysis is novel in consistently treating a 
side condition governing the prescribed response resonance character- 
istics as an integral part of the optimization. Also, a thermal stabili- 
zation procedure used to minimize the effects of long-term stress 
relaxation (accompanied by charge decay) is described theoretically 
and optimized. Virtually constant sensitivity over the service life is 
shown possible. 

These results, coupled with physical, material, and application con- 
straints, are shown applied in the physical design to achieve desired 
performance and reliability. A metallized Teflon electret film is ten- 
sioned, supported, and clamped above a selectively metallized station- 
ary electrode forming three cells acoustically and electrically in par- 
allel. A preamplifier and input sound port complete the subassembly, 
which is then tested and housed in a rectangular aluminum enclosure, 
electrically shielding the transducer. The nominal EL2 electroacoustic 
sensitivity, output impedance, response resonance frequency, and re- 
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quired dc supply are —32 dBV/N/m’, 1 kQ, 3200 Hz, and + 2 to 16 V, 
respectively. The response is relatively flat in the low-frequency stiff- 
ness-controlled region. A 3-dB cutoff occurs below 100 Hz. 

For completeness, note that the EL2 is described here as it existed in 
the initial design and field trial units and does not reflect minor 
changes as a result of manufacturing experience. A companion article” 
will treat the new technological aspects of the EL2. 
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APPENDIX 

Partial List of Symbols 

—7o Negative electret surface charge per unit 
area. 

h’ Air film thickness in the electrostatic state. 

h Rib height. 

d Polymer film thickness. 

LL = €0/E Inverse of relative permittivity. 

M., Ac, Ke Diaphragm effective mass, area, and stiff- 
ness per unit cell. 

S Membrane diaphragm’s applied force per 
unit edge (tension). 

E Number of diaphragm cells. 

Ur, Kr Rear acoustic chamber volume, stiffness. 

Rr Air film damping coefficient 

y That fraction of A, metallized on the rear 
electrode. . 

w, u(t) Diaphragm electrostatic and dynamic dis- 
placement. 

V, Zs Cell open-circuit sensitivity and source 
impedance. 

c' Electrostatic cell capacitance. 

Zp = (1/Rp + JwC,)™" Preamplifier input impedance. 

V> Transmitter (microphone) sensitivity. 

Wn, Wd Diaphragm natural and damped resonance 


frequency (in absence of front chamber). 
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Mn Maximum of the dynamic magnification fac- 
tor (in absence of front chamber). 


GF (t) Room temperature mechanical relaxation 
modulus of the metallized film diaphragm. 

o™(t), «“(t) Diaphragm stress and strain. 

t, 6 Real and reduced time. 

t Time duration of diaphragm creep prior to 
clamping. 

coe" Diaphragm stress and strain at time ¢t = ¢. 
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Comparison of Single Heterostructure and 
Double Heterostructure GaAs-GaAlAs LEDs for 
Optical Data Links 


By A. K. CHIN, G. W. BERKSTRESSER, and V. G. KERAMIDAS 
(Manuscript received February 22, 1979) 


This paper presents a theoretical and experimental comparison of 
the performance of single heterostructure (SH) and double hetero- 
structure (DH) GaAs-GaAlAs light-emitting diodes (LEDs). These LEDS 
are designed for optical data links operating at rates up to T3 (45 
Mb/s). The SH LEDs were optimized with respect to active layer 
carrier concentration and thickness; similar DH LED active layer 
parameters have not yet been optimized. We find experimentally that 
the DH diodes launch approximately 8 times the SH power into a butt- 
coupled, 0.36 numerical aperture (NA), graded-index fiber. Using a 
diffusion model, we show that the power output of the SH LED is 
limited by surface recombination and reduced current crowding. 
These results demonstrate that DH LEDs are necessary for applica- 
tions requiring high launched power. 


1. INTRODUCTION 


Burrus-type,’ single heterostructure (SH) light-emitting diodes 
(LEDs) are presently fabricated in our laboratory for use in optical data 
links. Their performance requirements are 45 MHz bandwidth and a 
minimum butt-coupled power of 10 nW into a 0.36 numerical aperture 
(NA), 55-um core, graded-index optical fiber at 60 mA forward current. 
SH LEDs were chosen for the simple growth procedure’ and exceptional 
reliability.? After optimization of the active layer width and carrier 
concentration for maximum efficiency while maintaining the required 
bandwidth, the SH LEDs butt-couple 10 to 15 u.W of power into the 
fiber. However, initially fabricated, unoptimized double heterostruc- 
ture (DH) LEDs were found to couple 7 to 8 times more power than the 
optimized SH LEDs. The purpose of this paper is to explain the differ- 
ence between the SH and DH LED performance. We begin with an 
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explanation of the factors influencing the efficiency and bandwidth of 
the two structures. 

Figure 1 is a schematic of our Burrus-type SH and DH LEDs. These 
LEDS use current crowding by contact area restriction to increase 
current density for increased brightness. In addition, the localized 
emission region couples the light more efficiently into the fiber. The 
contact diameter for both structures is chosen to be 50 yum, slightly 
smaller than the fiber core, for our performance comparison. 

Figure la shows the design of our SH LED. The active layer is 
confined on one side by the n-GaAlAs/p-GaAs heterojunction and on 
the other side by the p-ohmic contact. Since the injected carrier 
concentration is zero at the p-contact, the p-n junction (i.e., the 
minority carrier injection source) should be kept at least a. minority 
carrier diffusion length from the contact to minimize nonradiative 
recombination. However, as the p-n junction is removed in distance 
from the p-contact, current crowding decreases. Thus, for the SH LED, 
the active layer width is a compromise between the effects of current 
crowding and nonradiative recombination at the contact. Five microm- 
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Fig. 1—(a) Schematic of Burrus-type SH LED. (b) Schematic of Burrus-type DH LED. 
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eters, roughly the electron diffusion length in the p-active layer, is 
found experimentally to be the optimum active layer thickness. 

For the SH LED, the fundamental bandwidth is set by the active 
layer minority carrier lifetime which depends strongly on the doping 
density.‘ For increased doping density, the minority carrier lifetime 
decreases (i.e., bandwidth increases), but the internal quantum effi- 
ciency of the LED also decreases.* To optimize the SH LED performance, 
the active layer doping density may be chosen so that the minority 
carrier lifetime is longer than the value necessary to meet the band- 
width requirement. Nonradiative recombination at the p-contact, a 
result of the compromise in active layer thickness, increases the 
bandwidth to the required value. 

Applying analysis similar to those used for the SH LED, the DH LED 
is found to be more efficient for the following reasons. First, as seen in 
Fig. 1b, the active layer of the DH LED is bounded by heterojunctions 
which have low interface recombination velocities. Second, very thin 
active and p-GaAlAs carrier confinement layers can be grown so as 
not to decrease current crowding. For the thickness of 0.7 um for the 
active layer and 2 nm for the p-GaAlAs layer, the improved current 
crowding and carrier confinement of the DH LED results roughly in a 
factor of 2.5 superior performance compared to the SH LED. Next, when 
the injected carrier density exceeds the active region doping density, 
the radiative recombination rate increases with the current density.” 
This condition, referred to as conductivity modulation or bimolecular 
recombination, can easily occur only in the DH LED with adequate 
carrier confinement. As a result, the DH active layer doping density 
can be chosen at a value well below that necessary to meet the speed 
requirement to take advantage of the higher internal efficiency; con- 
ductivity modulation is used to obtain the higher bandwidth. At the 
lower doping density (1 X 10’’ cm7’), the DH LED has twice the internal 
quantum efficiency of the higher doped (1 x 10’? cm™°) sH LED.‘ Our 
analysis thus accounts for a total factor of 5 between the performance 
of the SH and DH LEDs. 

Detailed discussion of the above points is contained in the subse- 
quent sections. 


il. MATERIAL-RELATED PARAMETERS 

In this section, we discuss the selection of active layer hole concen- 
tration for ~70-MHz bandwidth sH and DH LEDs and the effect of the 
hole concentration on the internal efficiency of the LEDs. 
2.1 Lifetime 


For T3 data rate (45-Mb/s) optical communication systems, the LED 
should have an approximate 70-MHz bandwidth (pulse decay time 7 ~ 
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2.3 ns).* The higher speed is needed to allow for receiver response time 
and dispersion in thé optical fiber.® To obtain this large bandwidth, 
the active layer must be chosen with a minority carrier lifetime 
equivalent to the required bandwidth, or the device structure must be 
appropriately designed to increase the speed. For a SH LED, the device 
time constant can only be reduced by nonradiative recombination, e.g., 
surface recombination. On the other hand, nonradiative recombination 
or bimolecular recombination (conductivity modulation) may be used 
to increase the bandwidth of a DH LED. This mechanism also minimizes 
the loss due to interface recombination by reducing the diffusion 
length. . 

Figure 2 shows the decay time of DH LEDs as a function of active 
layer hole concentration. This figure is taken from Nelson.* The decay 
times shown are essentially structure-independent, since the effects of 
surface recombination and conductivity modulation were minimized. 
These values are somewhat larger than the minority carrier lifetime 
because of the effects of photon recycling, but they can be considered 
to be minimum bulk lifetimes.* 

From Fig. 2, the active layer hole concentration of SH LEDs should 
be chosen in the range p = 8 X 10'* cm™* to have a 70-MHz bandwidth. 
Due to the trade-offs between current crowding and surface recombi- 
nation, the range p = 6 X 10'® cm’ is allowable. In optimizing the 
active layer width, an increase in bandwidth and a reduction in 
efficiency due to surface recombination is also obtained. For 70-MHz 
DH LEDs, a lower active layer hole concentration range (p = 10'’ cm™*) 
can be used. A hole concentration below 10° cm™ is desirable for 
increasing the internal efficiency, while bimolecular recombination 
which maintains bulk efficiency is used to decrease the decay times 
from the values shown in Fig. 2. 


2.2 Internal efficiency 


Figure 3 shows the bulk efficiency of p-GaAs decreasing rapidly for 
p> 10° cm ®° at both high and low carrier injection. This figure is also 
taken from Nelson,‘ and the details of obtaining the data may be found 
there. For both low and high electron injection, the bulk efficiency at 
p = 1 X 10” cm™® is approximately twice that at p = 1 X 
10'° cm~*. The decrease in bulk efficiency at high doping density is 
presumably due to the introduction of nonradiative recombination 
centers.‘ Thus, since the active layer of SH LEDs must be doped 
approximately 10'° cm™ to obtain a 70-MHz bandwidth while only 10” 


* The pulse decay time (7) is the time for the LED light output to decay to 1/e of its 
peak value. + is related to the 90-percent to 10-percent fall time to0v-10%. by the relation: 
T90%-10%. = 2.27. ; 


1582 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 


1076 
Ga As 
300°K 

@7™DH 


10~6 O CASEY, et al 


AQ HWANG AND 
DYMENT 


O ACKET, et al 


1077 


10—8 


DECAY TIME IN SECONDS 


10-9 





10-7" 
1015 1016 1017 1018 1019 


HOLE CONCENTRATION IN cm—3 


Fig. 2—Variation of photo-luminescent decay times with doping level (from Ref. 4). 
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cm™ hole concentration is required for DH LEDs, the DH LED should 
have twice the efficiency of the SH LED. 


lil. STRUCTURAL PARAMETERS 
3.1 Effect of surface recombination and self absorption 


Figure 4 shows a DH structure and two SH structures. Following Lee 
and Dentai,’ we estimate the effects of surface recombination and self- 
absorption by using a one-dimensional diffusion model. At steady 
state, the one-dimensional continuity equation is given by 

d’n on 
D—,; --=0, 1 
dx? + (1) 
where n is the excess electron density, D is the electron diffusivity, 
and r+ is the bulk electron lifetime.’ As shown in the figure, the 
boundary conditions for each of the three structures are 

















DH 

-2) =F -Fn00) (2) 
- o == n(w) (3) 
SH) ; 

= o = 5 n(w) (5) 
SHo2 _ 

- a = a — = n(o) (6) 

ae = 9, (7) 


where w is the active layer width, s is the interfacial recombination 
velocity at the heterojunction interfaces, and J is the current density 
at the p-n junction. Table I lists n(x), the solution to the diffusion 
equation, for the three structures. Taking into account self-absorption 
in the active region, the light intensity (P) from the p-n junction side 
of the LED is given by 
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tid {| n(x)e~** dx + Re7“* n(x)e°* ax}. (8) 
27. F 


0 


The second term in the equation is the reflected power from the 
contact. Absorption losses due to the additional layer for the DH and 
SH; case are neglected. The emission frequency is pv, the contact 
reflectivity is R, the radiative lifetime is 7,, and the absorption coeffi- 
cient is a. The external efficiency of the LED is defined by 


Poy eV a . 
w-5(g)%. i 


Table I also lists yn, for the three structures. 

Comparing the solutions for SH, and DH diodes, only a small differ- 
ence is noted for sL/D < 1. In the range of thickness to which these 
equations are applied, the discussions concerning interface recombi- 
nation and self-absorption for the DH device refer also to the SH; 
device. The one disadvantage of the SH; structure relative to the DH 
structure is the lack of carrier confinement at the homojunction. Thus 
SH, is equivalent to DH under conditions where conductivity modula- 
tion is not important, but becomes less efficient at high carrier injec- 
tion. The SH LED discussed in the following sections refers only to SHe. 

Figure 5a displays the calculated sH external efficiency as a function 
of the active layer width for several diffusion lengths; Fig. 5b shows 
similar curves for the DH case. The calculation parameters are listed 
on the figures. Values for 1 x 10'’ cm™ and 1 x 10’? cm™ p-doping are 
chosen for the DH and SH case, respectively, for comparison with 
fabricated devices. 

The fabricated SH LEDs are analyzed using the 5-um SH efficiency 
curve in Fig. 5a, since a 5-um minority carrier diffusion length was 
measured on 10’? cm™ p-type GaAs using electron-beam-induced 
current (EBIC).° A peak in the sH efficiency results from the high p- 
contact surface recombination for small w and self-absorption for large 
w. The peak efficiency of 0.48 at w = 12 yum is inconsistent with the 
experimentally determined optimum layer thickness of 5 ym. Better 
agreement between calculation and experiment is found when current 
crowding is considered. 

For the DH case, an 11-ym diffusion length is derived from the device 
decay time of ~3 ns. The high output power of the DH LED with 
w = 0.5 pm is consistent with the high calculated efficiency shown in 
Fig. 5b. This consistency is maintained when current crowding is 
considered. 


3.2 Device bandwidths 


The effective device lifetime can be obtained from the carrier 
concentration by averaging over the active region:’ 


1586 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 


D = 40 cm2/sec 

s = 2000 cm/sec 
@=1000cm-! 

R=1 





0.1 1.0 10 100 
p LAYER THICKNESS IN ym 


D = 165 cm? /sec 
s= 2000 cm/sec 
a=7000cm-! 
R=1 





0.1 1.0 10 
p LAYER THICKNESS IN ym 


Fig. 5—(a) su efficiency vs p-layer thickness for indicated electron diffusion lengths. 
Material parameters are for 10’? cm~* doping. (b) DH efficiency vs p-layer thickness for 
indicated electron diffusion lengths. Material parameters are for 10'’ cm™ doping. 
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tef¢/T is listed in Table I for the three structures of Fig. 3. 

Figure 6a plots the calculated values of 7.4/7 as a function of p-layer 
thickness for the SH2 LED. The material parameters in the calculation . 
are those of 10'° cm™* Ge-doped GaAs for comparison with fabricated 
SH LEDs. The diffusion length is varied to show the effect of surface 
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Fig. 6—(a) Normalized su effective decay time vs p-layer thickness for indicated 
electron diffusion lengths. Material parameters are for 10'? cm~* doping. (b) Normalized 
DH effective decay time vs p-layer thickness for indicated electron diffusion lengths. 
Material parameters are for 10'’ cm~* doping. 


recombination. For a 5-um diffusion length and p-layer thickness of 5 
pum, surface recombination reduces the decay time by 0.35. Using the 
1- to 2-ns decay time for 10’? cm™ GaAs from Fig. 2, a device decay 
time of 0.35 to 0.7 ns is obtained. This value appears to be in disagree- 
ment with the measured value of 2.5 ns, but the device lifetime is 
probably the result of photon recycling in the thick p-layer.‘ 
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Figure 6b plots the calculated values of 7.4/7 as a function of p-layer 
thickness for the DH LED. The material parameters of 10'’ cm™ Ge- 
doped GaAs are used in the calculation for comparison with fabricated 
devices. The diffusion length is again varied to show the effect of 
surface recombination. A diffusion length of 11 um is obtained from 
the bimolecular recombination time constant of 7.3 ns and a diffusivity 
of 165 cm?/s.? The bimolecular recombination time for our DH LEDs is 


given by’ 
Jf ~1/2 
a 11 
Tar = B ( = ; (11) 


where B = 5 X 107"! cm’/s is the recombination probability, w = 0.5 
um is the active layer width, and J = 3000 A/cm’ is the current density. 
Tar is used in the analysis rather than the decay time in Fig. 1, since 
the injected carrier density 





J 1/2 
An = (3) = 2.8 X 10'8 cm? (12) 


is an order of magnitude greater than the doping concentration. From 
Fig. 6b, at 0.5-um active layer width, tgr is reduced by a factor of 0.63 
due to interface recombination to give a device decay time of ~4.5 ns. 
This value is consistent with the ~3.5-ns decay time measured on DH 
LEDs at 60 mA (J = 3000 A/cm’) forward bias. 


3.3 Current crowding 


The simplest method of coupling a LED to an optical fiber is to butt 
the fiber end directly to the LED. To maximize this coupling, the light- 
emitting area should be limited to the fiber core size. The graded-index 
fiber used for the present study has a 55-um diameter core. The sH 
and DH LEDs were fabricated with a 50-um contact to localize the light- 
emitting area. 

The SH and DH LEDs are Burrus-type diodes where a well has been 
opened in the opaque n-GaAs substrate to access the generated light. 
The current is confined to flow through the 50-um contact by a SiN, 
insulating coating. The current density and distribution in the active 
region is determined by the active layer width and the sheet resistivity 
of the active layer. A thicker layer width or lower sheet resistivity 
results in a lower current density above the contact and a larger 
emitting area. This current spreading reduces the coupling efficiency 
and, in the DH case, the bimolecular recombination. An improvement 
may be made by reducing the contact dimension, but a lower limit of 
25 um is set by the use of an evaporation mask to define the contacts. 

Using the technique described by Joyce and Wemple,”’ the calcu- 
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lated current density (J) at the p-n junction above the contact is 
plotted as a function of p-layer thickness in Fig. 7. The calculation 
parameters are indicated on the figure. The current density is a strong 
function of the p-layer thickness; the current density is half the 
maximum value of 2500 A/cm? by w ~ 1.5 pm. The other two curves 
in the figure are the product of the current density times the calculated 
external efficiency (7 X J) of the SH and DH LEDs. The effect of the 
bottom confinement layer for the DH structure is neglected in the 
current density calculation, since the figure is meant to show only the 
approximate behavior with thickness. 

The calculation of the SH efficiency uses the 10’? cm™* Ge-doped 
GaAs parameters (listed on the figure) for comparison with fabricated 
devices. The absorption coefficient and interface recombination veloc- 
ity are from Ref. 7, and the electron diffusivity is obtained from Ref. 
9. The electron diffusion length was measured using the electron beam 
induced current technique.® As shown in Fig. 7, the nsu X J curve 
shows a peak for an active layer width between 2 and 3 um. nsp X J is 
limited by surface recombination for thin layers and self-absorption or 
current spreading for thick layers. The calculated optimum active 
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Fig. 7—J, nou X J, and nsu X J vs p-layer thickness. J is the current density for the 
diode parameters indicated at a forward current of 60 mA, 7px is the DH efficiency for 
an 11-ym diffusion length from Fig. 5b, and ysu is the sH efficiency for a 5-um diffusion 
length from Fig. 5a. 
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layer thickness is lower than the experimentally determined value of 
~5 um for SH LEDs. However, small variations in the many variables 
that went into the calculation may easily shift the optimum value to 
~d pm. 

In the ypu X J calculation, the optimum active layer thickness is 
determined to be between 0.2 and 0.7 um. The material parameters 
corresponding to 10'’ cm? Ge doping were used. The diffusion length 
was corrected for the effects of bimolecular recombination. A peak 
occurs in y7pH X J for the reasons given in the SH case. The 0.5-~m 
active layer width of the fabricated DH LEDs lies within the calculated 
optimum range. From Fig. 7, nou X J for w = 0.5 pm is approximately 
2-% times the peak value of ysH X d. 


IV. SUMMARY 


We have compared SH and DH LEDs designed for optical communi- 
cations at T3 data rate. Our analysis shows a factor-of-2 loss in the 
internal efficiency of the SH LED due to the introduction of nonradiative 
recombination centers at the high active layer doping level required 
by the sH devices. An additional factor-of-2.5 loss was found in the 
external efficiency of the SH LED, resulting from surface recombination 
at the p-contact and the self Absorption and current spreading in the 
thicker active layer of the SH LED. The total factor-of-5 difference 
between SH and DH LEDs based on the arguments in this paper is in 
reasonable agreement with the experimentally determined value of 7 
to 8, since the calculations involved many variables. These results 
demonstrate that DH LEDs are required for data links limited by power 
launched into the optical fiber. 
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Adaptive Echo Cancellation/AGC Structures 
for Two-Wire, Full-Duplex Data Transmission 


By D. D. FALCONER and K. H. MUELLER 
(Manuscript received March 2, 1979) 


Three different receiver arrangements are studied, all of which 
incorporate provision for joint adaptive echo cancellation and gain 
adjustment to provide two-wire full-duplex data communication. In 
each case, the canceler consists of a data-driven transversal filter, 
but the architectures differ in the way the gain adjustment is provided. 
For all architectures, we investigate the properties of a joint adaptive 
LMS algorithm based on the receiver’s decisions on far-end data 
symbols and present appropriate computer simulations. We show 
that an arrangement where the gain control adjusts the reference 
level after the decision detector output performs significantly better 
than AGc schemes attempting to adjust the level of the analog signal. 


|. INTRODUCTION 


High-speed full-duplex data communication on a single channel is of 
immense practical interest. Data transmission via the DDD telephone 
network and the possibility of future digital subscriber lines are two of 
the most challenging applications. Techniques for achieving this goal 
fall in essentially three categories: Frequency Division Multiplexing 
(FDM), Time Division Multiplexing (TDM), and echo cancellation. Only 
echo cancellation allows full-bandwidth continuous use of the channel 
in each direction. This scheme therefore offers the highest potential 
bit rates. 

The transmitter and receiver are jointly coupled to a two-wire line 
via a hybrid. In an environment of changing channel characteristics 
(e.g., switched network), the hybrid balancing, if fixed, will at best 
provide a compromise match to the channel. In this mode, a vestige of 
the local transmitted signal, leaking through the hybrid, can be ex- 
pected to interfere with the incoming signal from the far-end simulta- 
neously operating transmitter. Figure 1 shows the system under dis- 
cussion, and Fig. 2 models the signals entering and leaving a two-wire 
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full-duplex modem. The local transmitter transmits a sequence of data 
symbols {b(n)} at T-second intervals as a PAM data waveform. The 
received waveform r(t) consists of a PAM data waveform with data 
symbols {a (n)} transmitted also at T-second intervals from the distant 
end, plus noise, plus the interfering vestige of the locally transmitted 
signal. This interfering signal (which we shall refer to as the echo 
signal or echo component) may have power comparable to or even 
greater than that of the desired far-end signal component. 

Decisions on the {a(n)} are made by quantizing the sampled re- 
ceiver output to +1 in the case of binary data, or to one of M values in 
the case of M-level data. A typically encountered echo component 
arising in a system with a conventional compromise balanced hybrid 
will cause an unacceptably high error rate. 

To remove the interfering echo component, the local receiver must 
perform echo cancellation; that is, estimate the echo signal and sub- 
tract it from the incoming signal prior to making decisions, as shown 
in Figs. 1 and 3a. The estimate is a transversally filtered version of the 
local data symbols {6(n)}* as proposed in Ref. 1. If the {6(7)} are 


* The {6(n)} may be different from the user data since they are defined to include 
such operations as differential encoding or scrambling. 
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binary, the implementation is simple, requiring only additions and 
subtractions. The transversal filter tap coefficients {p,} should ap- 
proximate the samples of the impulse response of the combination of 
the local transmitter and the echo path. 

Equivalently,’ the tap coefficients should be chosen to minimize, in 
a mean-square sense, the measured receiver error signal which is the 
difference between the actual receiver output y(n) and the ideal 
output. This error is available at each sampling instant of the received 
data. The subtraction of the (decision-directed) reference {da (n)} is, of 
course, what makes it possible to adapt quickly even in the presence 
of doubletalk. Such adaptation allows tracking time-varying compo- 
nents of the echo channel or coping with larger call-to-call variations 
in a switched system. However, since the level of the received signal is 
likely to vary significantly under those conditions, it is essential that 
proper scaling be done when the error signal is computed. Such scaling 
involves a gain adjust device which must operate jointly and adaptively 
with the echo canceler. This paper deals with this joint adaptation 
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Fig. 3—Receiver structure with echo canceler and automatic gain control. (a) Overall 
system showing various locations for acc function. (b) Basic Acc functions in one of 
above marked locations A, B, C, or D. 
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problem. Although we perform our study for a system operating at 
baseband, our results can be applied to passband structure via appro- 
priate redefinitions. 

As shown in Fig. 3a, the error e(n) is the difference between the 
receiver’s decision a(n) (assumed correct) and the receiver’s output 
y(n). But if the end-to-end channel gain is a, y(n) consists of a a(n) 
plus possibly noise and uncanceled echo. Intersymbol interference is 
not treated in this study. It is expected to be of secondary concern in 
typical subscriber cable systems, but we realize that it cannot be 
neglected for high-speed DDD applications. The error e(n) in the 
absence of gain control would contain the term (a — 1)a(n), which is 
relatively large if a differs significantly from unity. Therefore, a small 
steady-state mean-squared error, with consequent minimal fluctuation 
of the canceler tap coefficients {p, } and low error rate, is possible only 
if the nonunity gain a is compensated for by an acc adjusted to 
provide a gain approximating 1/a. The gain of the acc, denoted w, 
will be considered to be adjusted jointly with the echo canceler tap 
coefficients to minimize the mean squared error. 

Shown in Fig. 3a are four possible locations, A, B, C, and D, for 
placement of the acc whose functional form is depicted in Fig. 3b. 
Since D is identical to B as long as the signal is binary and the 
quantizer is ideal, only three variations need to be examined. The 
corresponding three receiver architectures, labeled naturally A, B, and 
C, may differ significantly in their adaption speeds. The main theme 
of this paper is the convergence of each of the three receiver arrange- 
ments, and how it is influenced by channel parameters, such as the 
relative powers of echo and far-end components. 

The study concludes that arrangement C which features an auto- 
matic reference control (ARC) at the quantizer output offers the fastest 
convergence rate, together with the simplest implementation. Arrange- 
ments A and B may suffer slower rates of convergence due to coupling 
interaction between the acc and echo canceler tap coefficient adap- 
tation. However, for arrangement B, a simple step-size modification is 
proposed, which on the average decouples the Acc and echo canceler 
adaptations, thus improving its convergence rate. Readers interested 
in C, the “best” arrangement, may skip Sections III to V, which deal 
with A and B. 

Mention of earlier related work is in order at this point. Adaptive 
echo cancellation without a jointly adapting AGc or equalizer for two- 
wire full duplex data communication has been treated in Refs. 1 to 4. 
References 1 and 4 treat essentially the echo-cancellation system 
proposed here, assuming a is known, and thus omit the acc. This type 
of scheme, in which the echo canceler’s input consists only of local 
data symbols, offers obvious simplicity of implementation. In Ref. 2, a 
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voice-type canceler is investigated, and Ref. 3 discusses an external 
data-driven structure which cancels the entire waveform by computing 
compensation samples at a high enough rate.* Reference 5 summarizes 
work reported in Ref. 1 and 3, and discusses a further receiver arrange- 
ment comprising an adaptive echo canceler and adaptive equalizer for 
mitigating end-to-end linear distortion. Arrangement A considered 
here, the so-called “convex canceler,” is a special case of this, since an 
AGC can be considered a one-tap linear equalizer. Reference 6 describes 
an echo-cancelling receiver structure incorporating decision feedback 
equalization. Finally, our arrangement C combined with decision feed- 
back equalization has recently been proposed in Ref. 7. 


Il. SYSTEM MODELING 


We shall examine the convergence properties of several adaptation 
strategies for the three receiver arrangements. Each follows a decision- 
directed approach; acc and echo canceler parameters are adjusted 
once per symbol interval, based on the observed error between the 
unquantized receiver output and the decision a(n) (for arrangements 
A and B) or wa(n) (for arrangement C). For purposes of analysis, the 
decisions a(n) are assumed equal to the actual data symbols a(n). The 
convergence of decision-directed adaptive receivers making small ad- 
justments at each iteration has been found to be negligibly affected by 
occasional decision errors. 

The analysis is based on a simple linear model of the end-to-end 
channel and leakage paths: the signal r(t) entering the receiver from 
the hybrid will be written as 


r(t) = ¥ a(n)ga(t —nT) + ¥ b(n) galt — nT) + v(t). (1) 
The first summation represents the signal from the far end, and ga (t) 
is the end-to-end channel impulse response, including receiver front- 
end filtering. The second summation is the echo signal, and gza(t) is 
the impulse response of the echo path. The symbol p(t) is a waveform 
of additive white noise. The symbol interval T is equal in both 
summations. This is tantamount to assuming that the near- and far- 
end transmitters are synchronized in clock frequency.’ 

Decisions ad(n) are made by quantizing samples of the receiver’s 
output to +1 in the case of binary data. The end-to-end channel 
impulse response and the phase of the receiver’s sampling clock are 
assumed ideal, so that no intersymbol interference is present in the 
samples of r(¢); i.e. the nth sample is of the form 


* In Ref. 2, the canceler input is the sampled transmitted waveform. In Ref. 3, it is 
the {b(n)}. 
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r(nT) = a(n) a+ ¥ O(R)Bn-» + v(nT), (2) 
k 


where a = ga(0), ga(nT) = 0 for n £0, and £, = ga(nT). In subsequent 
notation, we write r(nT) and v(nT) as r(n) and p(n), respectively. The 
binary (+1) data symbols {a(n)} and {b(n)} are statistically inde- 
pendent. The noise samples »(n) are assumed to be independent with 
zero mean and variance o”. 


ll. ARRANGEMENT A 


We recall from Fig. 3a that arrangement A forms the receiver output 
as the sum of the echo canceler and AGc outputs. The receiver output 
is a linear function of the echo canceler tap coefficients and AGC gain. 
Thus the mean-squared error at the receiver’s output is a convex 
quadratic function of the receiver parameters, and a simple gradient 
algorithm can be used with confidence to adjust the parameters jointly. 
As mentioned before, if the AGc is replaced by an adaptive linear 
equalizer, the arrangement A generalizes to the jointly adaptive echo 
canceler equalizer structure discussed in Ref. 5. 

Given an AGC gain w and a set of N echo canceler tap coefficients 
{ px}i=1, the receiver’s unquantized output sample y(7) is 


y(n) = wr(n) +5 pibin- by, (3) 
Define the N-dimensional vectors 
Pi b(n — 1) 
p= and b(n) = ; 
Bi b(n : N) 


and the (N + 1)-dimensional vectors in partitioned form as 


" rim 
c= ; and z(n) = bin) |" 


Then (3) is written more compactly as 
y(n) = e'z(n), (4) 


where {~ denotes transpose. The vector c is the set of receiver param- 
eters to be adaptively adjusted, and z(n) is the current set of inputs 
stored by the receiver. 

The ideal output at time nT would be a(n), and the error is 
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e(n) = y(n) — a(n). (5) 


The expression for the mean-squared error is 


(e(n)*) = cl Ac — 2e'x + 1, (6) 
where A is a (N + 1) by (N + 1) covariance matrix 
A = (2(n)z(n)', (7) 
and x is an (N + 1)-dimensional vector 
x = (a(n)z(n)). (8) 
By rewriting (6) as 
(e(n)?) = (ec — Ax)'A(e — Aix) +1 — xtA7! x, (9) 


and recognizing that A by definition is positive semidefinite, it is clear 
that the mean-squared error has its minimum value 


ein = 1 — xtA x, (10) 
when 
C = Cop = Ax. (11) 


Using the independence assumptions for the data symbols and noise, 
and the expression (2) for r(n), we readily find that A can be written 
as 


Aoo! Bt 
A= |---+---], (12) 
Boil 
where 
Aw = (r(n)?) = a? + ¥ Bi + 0°, (13) 
R 
Bi 
p=| - (14) 
Bn 


denotes the sampled echo impulse response truncated to N samples, 
and J is the N-dimensional identity matrix. Note that, because of the 
truncation involved in defining B, | 8|? = Yx 8%. Similarly, the vector 


xX 1S 
a. 
x=1/0], (15) 
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where 0 is an N-dimensional all-zero vector. 

Adaptive adjustment of the receiver parameter vector c can be 
accomplished by employing the Widrow-Hoff LMs algorithm’ just as 
in adaptive mean-square equalization.’ The current value of ¢ at time 
nT, e(n), is then updated according to 


c(n + 1) = e(n) — y e(n)z(n), (16) 


where y is a constant step size. The average value of the correction 
term —ye(n)z(n) is proportional to the negative of the gradient of the 
mean-squared error with respect to c. Expression (16) portrays the 
joint updating of the Acc gain and echo canceler taps: 


w(n + 1) = w(n) — ye(n)r(n). (17a) 
p(n + 1) = p(n) — ye(n)b(n). (17b) 


Adaptation of the echo canceler alone, according to (17b) with w 
fixed at 1, has been analyzed by Mueller.' The rate of convergence of 
p(n) to B for an optimum choice of y was shown to be determined only 
by the number of taps, rather than by the detailed characteristics of 
the echo path. 

The joint updating algorithm (16) resembles equalizer adaptation 
algorithms, whose convergence behavior has been extensively stud- 
ied.”’® The convergence of the more general version of (16), for joint 
echo cancellation and equalization, was discussed in Ref. 5. These 
theoretical studies have rested on an untrue assumption of indepen- 
dence of successive equalizer or canceler contents. However, a more 
rigorous analysis in Ref. 16, in addition to experimental results, sug- 
gests that the independence assumption does not cause serious error. 
The aforementioned studies have revealed that, for a fixed step size 
coefficient y, the speed of convergence is largely governed by the 
spread of the eigenvalues of the matrix A defined by (12). Without 
elaborating on the details, we can say that a ratio of maximum-to- 
minimum eigenvalues which is close to unity leads to relatively fast 
convergence, while slow convergence is associated with a maximum- 
to-minimum eigenvalue ratio which is much greater than unity. If the 
step size coefficient y is chosen to effect a judicious compromise 
between speed of convergence and noise due to random tap fluctua- 
tions, then a system with a small eigenvalue spread would typically 
converge in a number of iterations equal to a small multiple of the 
number of adjustable tap coefficients. 

The N + 1 eigenvalues of matrix A defined by (12), (13), and (14) 
are readily found to consist of N — 1 unit eigenvalues plus Amax and 
Amin, given by 
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Ama = //2[lta°+oe0+Y¥ Bit 


min k 


| V(L + a + of + Y Bi)” — Ala? + of + YB |B), (18) 


where the plus is associated with Amax and the minus with Amin. It is 
straightforward to show that 


ete < 1 < tees 


It is interesting to point out that expression (18) for Amax and Amin 
coincides with the expressions for bounds on the maximum and mini- 
mum eigenvalues found in Ref. 5 for the more general canceler/ 
equalizer combination. Table I shows values of Amax and Amin for various 
values of (a + o”) (the power of the far-end signal component plus 
noise) and )', 8% (the power of the echo component), assuming )\, B% 
= |B”, so that perfect echo cancellation is possible. 

The ratio Amax/Amin increases rapidly as a” + o° decreases. Thus a 
system with a far-end signal component which is much weaker than 
the near-end echo component would be expected to converge much 
more slowly than one with a relatively strong far-end component. This 
sensitivity of the convergence behavior to the relative strengths of far- 
end and near-end signal components was also noted in Ref. 5 for the 
canceler/equalizer system. 


IV. ARRANGEMENT B 


In this arrangement, combining the received signal and echo canceler 
output is done ahead of the acc. This may appear as a more natural 
arrangement, since the intent is to cancel the echo component at the 
sampling instants before they enter the portion of the receiver devoted 
to estimating a(n). The local data symbols are processed by the echo 
canceler and AGC in tandem; the output is not a linear function of the 
canceler tap coefficients and AGC gain, and the mean-squared error is 
not a convex function of these receiver parameters. We can thus call 
this second arrangement a “nonconvex canceler.” 

The receiver output is 


Table | 
Far-End Signal Power Eigenvalue 

Plus Noise Power Echo Power —-—__——- Ratio 
a + oe | B | ? Amin Amax ho/Ai 

0.1 1.0 0.0487 2.015 41.4 

0.35 1.0 0.1598 2.190 13.7 

0.50 1.0 0.2187 2.281 10.4 
1.0 1.0 0.3820 2.618 6.85 
2.0 1.0 0.5858 3.414 5.83 

1.0 2.0 0.2680 3.732 13.9 
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y(n) = w[r(n) + p'b(n)]. (19) 


With r(n) modeled as in eq. (2), the mean-squared error for given 
values of w and p is 


(e(n)”) = ((y(n) — a(n))’) 
= w'{a? + © Bi + o +|p|?+2p'B] 
~2 we +1, (20) 
which can be written as 
(e(n)*) = (wa — 1)? +|p t+ Bl?w’ 
+w [o + ¥ Bi — |BI"I. (21) 


The final term in brackets represents the effect of additive noise and 
uncancellable echo, if any. Expression (21) can also be written as 


2 


(e(n)’) = (a? + 8°) (w = Wopt)” + w*|p — Popt | : + we 4 8’ (22) 

where 
f=0 +> B-|BI? (23a) 

k 
and 
Qa 

Wopt = w+ oe (23b) 
Popt = —B (23c) 


are the parameter values which minimize (e(n)?). In practical systems, 
6 is small and Wop ~ 1/a. 

It is instructive to plot contours of constant mean-squared error in 
the plane whose coordinates are the AGC error 


Cw = W — Wopt (24) 
and canceler error magnitude 


[ep | = |P — Pope, (25) 


respectively; i.e., curves satisfying 
1 
a’ez, + (c. + 2); le, |? = MSE (26) 
a 
(assuming 6 is negligible), for various positive values of MSE. Such 


contour plots are shown in Figs. 4 and 5 for values of the end-to-end 
gain «a of 0.5 and 1. Each plot shows 10 contours for MSE ranging from 
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0.2 to 2.0 in steps of 0.2. In each case, there is a unique minimum 
(ey = | ep | = 0), but the MSE surface is not convex. Moreover, it exhibits 
a kind of trough running along the line e,, = —1/a. The floor of this 
trough slopes very gradually toward the origin for large values of | e, | . 
The shape of the contours for this system differs radically from the 
ellipsoidal contours that are characteristic of the convex system. Those 
contours are plotted in Figs. 6 and 7 for the same values of MSE and of 
a and | #| = 1, as for the nonconvex system. The ellipses are plotted 
for convenience for error values mapped onto the two eigenvectors ji 
and pe of the matrix A. The eccentricity of the ellipse is the ratio Amax/ 
Amin. Similarly, the convex system would converge slowly, starting from 
zero-valued parameters if a is small (implying Amax/Amin > 1). 

A gradient procedure (LMs algorithm) for jointly adjusting w and p 
should, with proper choice of step size, converge to the optimum 
parameters. It would, on the average, follow a path perpendicular to 
the contours that it crosses. Thus very slow convergence of the 
nonconvex system would be expected if the initial value of all param- 
eters is zero; i.e., starting at e, = —(1/a), |ep| =|B|. 

An LMS algorithm for the nonconvex system is obtained by making 
the correction terms proportional to the negative gradients of the 
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seenien 


Fig. 6—Contours of constant mean-squared error for arrangement A (a = 0.5, | B| = 1). 
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Fig. 7—Contours of constant mean-squared error for arrangement A (a = 1, | B| = 1). 


squared error. The resulting joint AGc/canceler updating algorithm 
would then be 


w(n + 1) = w(n) — ye(n)(r(n) + b(n)'p(n)) (27a) 
p(n + 1) = pln) — ye(n)w(n)b(n), (27b) 

where 
e(n) = w(n)(r(n) + b(n)'p(n)) — a(n). (27c) 


V. REDUCING COUPLING EFFECTS ON THE NONCONVEX SYSTEM’S 
ADAPTATION 


Combining (27b) and (27c) results in the following algorithm for 
updating p(7): 


p(n + 1) = p(n) — yw(n)’b(n)[r(n) + b(n)'p(n)] 
+ yw(n)a(n)b(n). (28) 
It would be desirable to reduce or eliminate the coupling effect of w(n) 


in this algorithm. Consider modifying the algorithm by replacing the 
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constant step size y with y/w(n)*, where the second y is a constant. 
Then (28) becomes 


p(n + 1) = p(n) — yb(n)[r(n) + b(n)*p(n)] + a a(n)b(n). (29) 
The average value of the correction term for fixed p(n) is then 
— y[B + p(n)], which is the same as the average correction term for 
an adaptive echo canceler with no acc. Thus by the simple choice of 
step size y/w(n)”, we can on the average remove the influence of AGC 
adaptation on echo canceler adaptation. The only remaining coupling 
stems from the term [y/w(n) ]a(n)b(n), whose mean value is zero. The 
mean of the correction term in (27a) for updating the AGc gain is 


—y(e(n)(r(n) + b(n)'p(n))) 
= —ya(w(n)a — 1) — yw(n)| p(n) + BI? 
—yw(n)(o? + > Bi —|B\*), (30) 


in which coupling from the echo canceler adaptation is evident in the 
middle term yw(n) | p(n) + B|*. There is no simple way to eliminate 
this coupling, apart from observing that p(n) + £ is the error in the 
echo canceler’s tap coefficients, which eventually dies away. 

In summary, we propose the following algorithm for jointly updating 
the AGC gain w(n) and the echo canceler tap coefficient vector p(n): 





w(n + 1) = w(n) — yie(n)(r(n) + b(n)'p(n)) (31a) 
= __¥ 
p(n + 1) = p(n) ae e(n)b(7), (31b) 


where y; and y2 may be different constants. Algorithm (31b) is, on the 
average, uncoupled from (3la) and equivalent to the echo canceler 
algorithm operating in solitude with step size y2. The latter algorithm 
has been examined by Mueller,’ who determined an optimum step size 
y equal to the reciprocal of the number of taps, and demonstrated 
favorable convergence characteristics, independent of echo path char- 
acteristics. The choice of y; would best be made by experiment. While 
the above decoupling modification was cnly heuristically motivated, 
the simulations reported in Section VI confirm its usefulness. 


Vi. ARRANGEMENT C 


Arrangement C, shown in Figs. 3a and 8a, simply omits the acc for 
purposes of making a decision on the binary symbol a(n); the quantizer 
input, which is the algebraic sum of the channel output sample and 
the echo cancel 2r output, is hardlimited to +1. Note that, if the {a(n)} 
are binary symmetric (a(n) = +1) data symbols, then the attenuation 


1606 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 


of the end-to-end channel is irrelevant and no explicit AGC is necessary 
. for making a decision. This comment also applies to other baseband 
data symbol formats such as diphase, and also to phase-modulated 
signals, in which case the data symbols a(n) are numbers lying on the 
unit circle in the complex plane. 

To enable adaptive adjustment of the echo canceler in arrangement 
C, the receiver's decision a(n) is scaled by an adjustable coefficient w 
before being subtracted from the unquantized output to form the error 
which is used to update the tap coefficients. The coefficient w thus has 
the role of an automatic reference control (ARC), rather than an AGC, 
since it adjusts the receiver’s reference signal to a level commensurate 
with the attenuation of the end-to-end channel. It is adjusted jointly 
with the echo canceler tap coefficients. Moreover, it multiplies a 
discrete-valued data symbol, not a continuous-valued channel output, 
and so digital implementation of the receiver is simplified. 

This receiver arrangement can also be applied to multiamplitude 
data formats as shown in Fig. 8b. In the multiamplitude case, the 
quantizer compares the analog signal with reference levels which 
require proper scaling in relation to its amplitude. The quantity w 
provides this information and can thus directly serve as a reference 
input to the quantizer as shown in Fig. 8b. A less attractive alternative 


e(n) 






a(n) 


HARD 
LIMITER 
(a) 


e(n) 


y(n) 2(n) 


QUANTIZER 





(b) 


Fig. 8—Decision-making and automatic reference control in arrangement C. (a) Case 
of binary data symbols. (b) Case of multilevel data symbols. 
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would be to multiply the input signal by 1/w to bring it to a constant 
level (as with the Acc schemes) and then apply it to a quantizer with 
a fixed reference. 

Given a set of N echo canceler tap coefficients {pz} %, the receiver’s 
output sample y (7), which will subsequently be quantized to form the 
output data symbol a(n), is 


N 
y(n) = r(n) + 2 prb(n — k). (32) 


Ideally, the coefficients {pz}?-1 should approximate the negatives of 
the echo channel’s impulse response samples { 8;}}-1, So that the echo 
is canceled and y(n) consists of aa(n) plus noise. 

The desired value of y(n) is wa(n), where w is an estimate of the 
end-to-end channel gain a. The error between the actual and desired 
receiver outputs, measured at the nth symbol intervals, is then 

N 


e(n) = r(n) + Y prb(n — k) — wa(n). (33) 
k=l 


The ARC parameter w implicitly assumes the role of an AGC, multiplying 
the desired receiver output data symbol rather than the receiver 
input.* The parameters {pz} and w are to be jointly adjusted with the 
aim of minimizing the mean-squared value of e(7). 

For notational compactness, define the N-dimensional vectors 


P = (D1, Pa +++, py] (34a) 
and 
b(n) = [b(n — 1), b(n — 2), ---, b(n — NJ, (34b) 
and the (N + 1)-dimensional partitioned vectors 
c=[p:-w]' (35a) 
and 
u(n) = [b(n) : a(n)]". (35b) 
Then the unquantized receiver output is written 
y(n) = r(n) + p'b(n), (36) 
and the error is 
e(n) = r(n) — c'u(n). (37) 


Squaring both sides of (37) and taking the expectation, we find that 
the mean-squared error is 


* In arrangements A and B, the ideal value of w equals 1/a; in arrangement C, this 
value equals a. 
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(e(n)?) = 0? + |e|’, (38) 


where € = c — s and 


s = (r(n)u(n)) = (4), (39) 
Qa 


The minimum mean-squared error is o”, and the excess mean-squared 
error at time nT is defined as | €(n)|, where 


e(n) = c(n) —s (40) 
is the difference between the tap coefficient vector c(n) at time nT 
and its optimum value s. 

A simple gradient algorithm for updating c(7) is 
c(n + 1) = e(n) + ye(n)u(n), (41) 


where y is a constant. (Note from expression (37) that e(n)u(n) is 
proportional to the gradient of e(n)? with respect to ec.) We shall 
examine the convergence of the excess mean-squared error (| e(n)|’), 
as in earlier studies, assuming that successive input vectors u(n) are 
statistically independent. Subtracting s from both sides of (41), we 
have 


e(n + 1) = e(n) + ye(n)u(n). (42) 
Since the N + 1 components of u(n) are +1, 
|u(n)|? =N +1. (43) 
Also the average of the inner product 
(e(n)'u(n)e(n)) = (e(n)*u(n)(r(n) — u(n)te(n))) 
= (e(n)"(s — e(n))) 
= — (|e(n)|’), (44) 
where we have used the assumption that u(n) is independent of 
u(n — 1), and therefore of c(n). Using (38), (43), and (44), after squaring 
and averaging both sides of (42) we get an equation describing the 
evolution of the excess mean-squared error 
(je(n + 1)]?) = ([e(n)?)[1 — 2y + y?(N + 1)] + y2(N + 10”. (45) 
This expression is readily iterated to yield 
({e(n)|?) = [1 — 27 + PIN + 1)" e(0)) 
4, WN + Dorf — (1 = 27 + "(N+ 1)" 


2— y(N + 1) 9) 
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To guarantee convergence, the constant y must be such that 





0O<y< 3 
CN eT 


Then the first term in (46) is a transient, which eventually decays to 
zero and the steady-state value lim (| €(7)|*) is 


y(N + 1)o? 
2—y(N+1)- ap 
The steady-state total mean-squared error is this plus o”, or 
20” 
hi 2) = —_____, 48 
ee aN 


Expressions (46) and (47), describing the evolution of the excess 
mean-squared error of arrangement C, are of the same form as those 
derived in Ref. 1 for the echo canceler alone, with no acc. The only 
difference is that (N + 1) replaces N (as a result of the extra AGC 
coefficient w). Note that the convergence rate of (e(n)”) is independent 
of all channel parameters, in contrast to the dependence of the con- 
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Fig. 9—Reduction of mean-squared error. Arrangement A: wo = 1, y = 0.01, 16 taps. 
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DECIBEL REDUCTION IN MEAN SQUARED ERROR 
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Fig. 10—Reduction of mean-squared error. Arrangement A: a = 0.316, y = 0.01, 16 taps. 


vergence raves of arrangements A and B on the relative echo and far- 
end signal powers. 

The choice of the adaptation coefficient y reflects a compromise 
between fast adaptation and small steady-state mean-squared error. A 
suitable choice, proposed in Refs. 1 and 11, is 


1 
aN ae 


which yields a steady-state mean-squared error of twice the minimum 
mean-squared error o”. With this choice of y substituted in (46), we 
find that the excess mean-squared error decreases toward its minimum 
value (47) at a rate of 4.34/(N + 1) dB per adjustment. 


Vil. SIMULATION 


Computer simulations afforded a comparison of the actual conver- 
gence behavior of the three arrangements. Results of these simulations 
are shown in Figs. 9 through 15. In each case, the same echo channel 
is used in combination with a 16-tap canceler and y = 0.01. The echo 
power is normalized to unity, and the sampled echo response is 
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DECIBEL REDUCTION IN MEAN SQUARED ERROR 
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Fig. 11—Reduction of mean-squared error. Arrangement A: 


Ma = 01,w = 10 
@ «a = 0.316, wo = 3.16 Initial value of acc is 1/a; y = 0.01, 16 taps. 
@ a = 1,0 = | 


truncated after 16 samples, so that perfect cancellation could be 
obtained via a set of proper coefficients. For the nonconvex arrange- 
ment B, only the decoupled updating algorithm (31) has been used 
since some initial runs without this modification showed convergence 
problems for a variety of parameter choices. 

In both arrangements A and B, a uniform, nice exponential conver- 
gence seems to be the exception rather than the rule. For weak received 
signals, the mean-squared error often initially. reduces rapidly up to a 
certain point, after which convergence can become very slow. This 
problem appears to exist even in the case where the AGC is preset to 
the reciprocal of the received signal. One explanation for this behavior 
is that, in those cases where the initial mean-squared error is domi- 
nantly caused by a misadjustment in only one loop, the overall behav- 
ior at first approximates that of a system where either only w or p is 
the only parameter to be adjusted. This provides a relatively fast 
reduction of the initial error to the point where a joint improvement 
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Fig. 12—Reduction of mean-squared error. Arrangement B: wo = 1, a = 0.01, 16 taps. 
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Fig. 13-Reduction of mean-squared error. Arrangement B: a = 0.316, y = 0.01, 16 
taps. 
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Fig. 14—Reduction of mean-squared error. Arrangement B: 


@Ma = O1,w = 10 
@ a = 0.316, w) = 3.16 Initial value of acc is 1/a; y = 0.01, 16 taps. 
@ a = 1, wo = | 


of w and p is required. Once this situation is realized, a much slower 
convergence rate is expected (as we have pointed out in earlier sec- 
tions), in particular with weak received signals. In the latter case, the - 
value of w must become large and any inaccuracies in p are magnified. 

Although we do not fully understand the dynamics of these systems, 
we feel it is worthwhile to present these results to point out the 
inherent problems to communication system designers. Further work 
would be required to provide a more thorough insight into these 
systems, but it is our feeling that this may be of more academic interest 
since our analysis has already shown that arrangement C provides as 
attractive a solution to the problem as one could possibly wish. This 
is demonstrated in Fig. 15. Only a single curve is presented, but actual 
simulations have shown that channel attenuation and initial ARC value 
can be varied by orders of magnitude and all such resulting curves 
would be essentially identical to the one shown in Fig. 15. The fact 
that simulations tend to deliver more rapid convergence than theoret- 
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DECIBEL REDUCTION IN MEAN SQUARED ERROR 
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Fig. 15—-Reduction of mean-squared error. Arrangement C: 16 taps, y = 1/(N + 1). 


ically predicted is, of course, due to the programmed “ideal random- 
ness” of the data sequence used in these simulations, whereas the 
analytic result is an average which includes many sequences that will 
not provide convergence at all (e.g., constant zeros, ones, or a dotting 
pattern). 


Vill. DISCUSSION AND SUMMARY 


Three arrangements of joint adaptive echo cancellation and gain 
control for full duplex data transmission have been examined. Each 
corresponds to a different Acc location. The third, arrangement C, has 
proven superior in two respects: 

(t) Its convergence rate depends only on the adaptation coefficient 
and on the number of adjustable tap coefficients. On the other hand, 
arrangement A suffers slower convergence as the ratio of echo power 
to distant signal power increases. Arrangement B’s echo canceler can, 
by proper choice of adaptation coefficient, on the average be made to 
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converge at a rate independent of channel parameters, but the conver- 
gence of its gain control still depends on the gain of the forward 
channel. : 

(ii) Arrangement C offers simpler digital implementation; the gain 
w multiplies a data symbol, instead of a finely quantized channel 
output or receiver output. Multiplication in arrangement C becomes 
addition or subtraction in the case of binary symbols. 

Although the foregoing analyses presuppose binary data symbols, 
each receiver arrangement accommodates multilevel symbols. 

Severe intersymbol interference may necessitate forward equaliza- 
tion. If so, arrangement A with the single coefficient w(n) replaced by 
an adaptive transversal filter is necessary. This case is discussed in 
Ref. 5. The favorable convergence properties and hardware simplicity 
of arrangement C may be retained if adaptive decision feedback 
equalization with a fixed forward equalizer is used.°” 
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Laser Transmitters for 70-MHz Entrance Links 


by F. S. CHEN, M. A. KARR, and P. W. SHUMATE 
(Manuscript received March 8, 1979) 


Lightwave transmitters providing an amplitude-modulated light 
output from injection lasers have been designed and tested. Six 
transmitters were assembled using GaAlAs double-heterostructure 
lasers and displaying (i) third-order intermodulation products 
>26 dB below the fundamental at 80-percent modulation index and 
0.6 to 0.8 mW average optical output; (it) excess noise ratios <1.5 dB; 
and (iit) frequency response within +0.6 dB from 0.1 to 100 MHz. 
These transmitters were burned-in at room temperature for 300 h and 
used with entrance links in a satellite experiment. 


1. INTRODUCTION 


An experiment to evaluate lightwave technology as a means for 
providing inexpensive, wideband, reliable entrance links between re- 
mote satellite earth stations and video/telephone operating centers is 
being performed at Bell Laboratories locations at Holmdel, N.J. and 
Naperville, Ill.’ In this experiment, either light-emitting diodes (LEDs) 
or injection-laser diodes (ILDs) are intensity-modulated with a 70-MHz 
electrical signal that is frequency-modulated by baseband signals (IM/ 
FM). Analog receivers using either p-i-n photodiodes or avalanche 
photodiodes (APDs) convert the modulated light back into electrical 
format. The frequency-modulated 70-MHz carrier occupies the band 
70 + 20 MHz. There is also a possibility of sending several narrowband 
FM carriers occupying the same band. The baseband signal would be 
either a single 6-MHz diplexed TV channel or 1200 multiplexed voice 
channels occupying a similar bandwidth. Others have also reported 
the application of fiber optics to satellite entrance links.”” 

The design objective for the transmitter was to achieve a carrier-to- 
noise ratio(CNR) of at least 35 dB. If an injection laser were used as the 
source, a simple analysis of signal-to-noise ratio showed that, to meet 
this requirement, an average power of —10 dBm must be launched into 
the optical fiber.* This assumes 50-percent modulation index, 10 dB of 
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fiber and connector loss, 10-dB excess noise from the laser, and a p-i- 
n photodiode as a detector. The receiver amplifier is of the transimped- 
ance type with a transimpedance of 4 kQ and a noise resistance of 
2 kQ. Because of the large laser excess noise assumed in this analysis, 
the use of an APD would not improve the receiver sensitivity. A 
launched power level of —10 dBm can be easily met using injection 
lasers. On the other hand, if an LED were used as a source and an APD 
as a detector, the required power level would be —15 dBm. Burrus- 
type LEDs can easily meet this requirement. For a single-carrier signal, 
distortion arising from nonlinearities in the lasers or photodetectors is 
relatively unimportant. However, for a multi-carrier signal, distortion 
products will have the same effect on the Fm detection process as 
noise, and thus the sum of noise and distortion must be kept 35 dB 
below the carrier level. The linearity requirements could be met using 
selected injection lasers and Burrus-type LEDs, although the distortion 
may increase with aging in the case of lasers.” Nevertheless, it was 
decided to use lasers as sources because they could launch much larger 
power into the fiber and because of their long lifetimes.® 

This paper first describes the transmitter design, including circuitry 
and packaging. Next, intermodulation (IM) distortion and excess-noise 
measurements on the transmitters are presented. Finally, observations 
of transmitter performance during burn-in are described. 

We designed and assembled nine analog transmitters using 12-~m 
stripe, GaAlAs double-heterostructure injection lasers.’ These trans- 
mitters were burned-in for 300 hours and six of the nine were used for 
the field experiment. The receivers, not described in this paper, were 
equalized versions® of a previous design,’ with a response of + 0.1 dB 
from a few kilohertz to 100 MHz. 


ll. TRANSMITTER CIRCUITRY AND PACKAGING 


An injection laser is a threshold device with a light-current (L-I) 
transfer characteristic as shown in Fig. 1. Below threshold, light output 
is spontaneous LED light. Above threshold, the light is a coherent, 
lasing output. For analog modulation, the laser clearly must be biased 
above threshold, for example at point B in Fig. 1, in the center of a 
linear lasing region. The optical output then follows amplitude varia- 
tions in the device current. 

Transmitter design is complicated, however, by the fact that the 
threshold, and hence the operating point, are strong functions of 
temperature and device aging. Therefore, the operating point B must 
be stabilized under feedback control of some sort. 

A digital circuit previously designed incorporating such feedback 
control’’ has been modified for high-frequency analog modulation. The 
modified circuit consists of two distinct parts: the driver and the 


1618 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER, 1979 







OUTPUT 


LIGHT OUTPUT 


THRESHOLD — ~~ 


LASER CURRENT 


INPUT 


Fig. 1—Analog modulation of an injection laser. 


feedback control itself (see Fig. 2). The driver is a one-transistor circuit 
which converts the input signal voltage to a collector current which 
flows through the laser. Voltage measured across the 10 {2 emitter 
resistor of Q; at a test point allows monitoring of the modulation 
current. The frequency response of the modulated light output using 
this driver was within +0.6 dB from 0.1 to 100 MHz. The second- and 
third-order intermodulation products in the drive current were at least 
42 dB and 52 dB respectively, below the fundamental. Thus it is seen 
in the next section that the distortion of the circuit can be ignored in 
comparison with distortions arising in the laser. 

An optical-fiber tap'’ samples the average optical output for feed- 
back control. In a standard closed-loop configuration, the laser bias is 
controlled so as to maintain the level of this sample constant on a 3- 
ms time scale. Therefore, the bias does not change in response to 
modulation of frequencies above 300 Hz, but does limit the low- 
frequency response of the transmitter. (Experiments have shown, 
however, that the time constant can be made much longer so that 
frequency response is attainable down to tens of hertz.) 

The laser was mounted in a hermetically sealed package.’* The laser 
emission was coupled through a pigtail fiber with a hemispherical lens 
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Fig. 2—Laser transmitter circuit. 


at its tip to improve the coupling efficiency (about 30 to 50 percent). 
The optical tap in the pigtail assembly diverted about 10 percent of 
the power in the fiber to a p-i-n photodiode for controlling the laser 
operating point. There was also a p-i-n photodiode in the package for 
monitoring the back-mirror laser emission. The laser package and the 
electronic circuits were mounted on a printed-circuit board. 


Ill. INTERMODULATION DISTORTION 


Intermodulation products were measured by applying two signals of 
equal amplitude at frequencies f, = 25 MHz and f2 = 30 MHz to the 
laser through a transistor driver (Q, in Fig. 2). The laser was dc-biased 
to emit 2 mW of optical power from its front mirror (Pru) although, 
infrequently, measurements were made at higher average power levels. 
The modulated light was detected using a p-i-n photodiode and its 
output power was measured with a spectrum analyzer. The fundamen- 
tal power, either f; or f2 components, one of the second-order products 
fi + fo = 55 MHz, and one of the third-order products 2f. — fi = 35 
MHz were measured as a function of modulation index m. Intermo- 
dulation products from the laser could be measured to about 50 dB 
below either f; or f. components (56 dB below the sum of f; and fo 
components) using this apparatus. Results for one laser having a fairly 
linear L—I characteristic are shown in Fig. 3, where V;i, is the peak 
input signal voltage per tone to the transistor driver. One notices that, 
at Pm = 2 mW (shown in solid lines), the fundamental 
component (either f; or f2) of the output power of the photodiode is 
increased by 6 dB, the second-order component by 12 dB, and the 
third-order component by 18 dB as V.i, is doubled. This is typical of a 
nonlinear transfer charactistic that can be expressed as a Taylor series. 
This relationship did not hold at Pr_ = 3 mW (shown in broken lines) 
where a large swing of Vi, extended to a region of greater L—I 
nonlinearity. 

The second-order components can be excluded from the useful band 
by choosing a proper frequency-multiplexing scheme. However, the 
third-order components fall within the band and they degrade the 
effective cNR. From Fig. 3, the third-order component (2f2 — fi) at 
m = 0.8 was 37 dB below the f; component; therefore, the carrier-to- 
distortion ratio (CDR) is 10 log P(2f2 — fi)/2P(fi) = 42 dB. The largest 
third-order products are of the form f, + f2 — fs and are 6 GB larger 
than the type 2f — fi.'* Thus the worst cpr of the distortion products 
of this laser would still be 37 dB, exceeding the requirement for the 
multicarrier FM applications (35 dB). However, the uncertainty in the 
aging behavior of these lasers makes them less suitable for these 
applications. It has been observed that nonlinearities in the L—I 
characteristic are not always stable over extended periods of time. 
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Fig. 3—Fundamental, second-, and third-order intermodulation components vs 2 Vyxig 
at Pry = 2 mW and 3 mW for a 12-um stripe laser. V.i, is the peak input voltage per tone 
to the transistor driver. The slopes of the lines are 6 dB/octave, 12 dB/octave, and 18 
dB/octave. Modulation depth m is also shown. 


Both the severity of a nonlinearity and its position on the L—I 
characteristic can change during periods of hundreds to thousands of 
hours.” 

For other 12-um stripe lasers with various degrees of nonlinearity in 
their L—I characteristics, the third-order components varied from 17 
to 46 dB (average 33 dB) below the fundamental at 2 mW average 
powers (Pym) and m = 0.8. 

The second-order intermodulation components of these lasers varied 
from 16 to 40 dB (average 29 dB) below the fundamental at Pra = 2 
mW and m = 0.8. 
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Intermodulation products from a linear 8-m stripe laser’ were also 
measured for average powers up to 5 mW. The results are shown in 
Fig. 4. The distortion for a given depth of modulation improved as the 
average laser power was increased. At 2.56 mW and m = 0.8, the third- 
order products were about the same as for linear 12-um stripe lasers 
(~40 dB below fundamental). 

Intermodulation products from a strip-buried-heterostructure (SBH) 
laser'® were also measured. At an average power of 4 mW and m = 0.8, 
the third-order component was down by about 48 dB. Further improve- 
ment in the sensitivity of the measuring setup is necessary to determine 
such small distortion more accurately. In addition to having a linear 
L—TI characteristic extending up to 100 mW/mirror under pulsed 


MODULATION DEPTH (m), Pey = 2.5mW 


0.2 0.4 0.8 
—20 = 





—30 


FIRST ORDER 


—50 


POWER IN dBm 


—70 SECOND ORDER x2. 


~ 
~*~ THIRD ORDER 


50 100 200 300 500 
2Veig IN mV 


Fig. 4—Fundamental, second-, and third-order intermodulation components vs 2 Vyig 
at Pi» = 2.5 mW and 5 mW for an 8-ym stripe laser. Modulation depth m is also shown. 
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conditions, the SBH laser also has the advantages of a narrowbeam 
divergence and a single longitudinal mode of operation. Thus, it is well 
suited for applications requiring amplitude modulation. 


IV. EXCESS NOISE 


Since the laser is a regenerative amplifier saturated by spontaneous 
emission noise, the noise present in its output intensity rises above the 
shot-noise level near threshold and, ideally, it falls back to the shot- 
noise level above threshold. This excess-noise behavior has been 
observed in some junction lasers,’’ but in others, especially those with 
nonlinear L—I characteristics, excess noise persisted above threshold. 
Since most of our lasers showed some degree of nonlinearity, we 
measured the noise behavior of these lasers as part of a process before 
packaging. 

The excess-noise ratio is defined as the ratio of intensity noise from 
the laser to the shot noise expected for the same average photocurrent. 
(This was called relative-noise ratio in Ref. 17.) Since the noise from 
an injection laser is known to be independent of frequency from a few 
megahertz up to its resonance frequency near 1 GHz, the noise was 
measured at 30 MHz using the apparatus shown in Fig. 5. First, the 
total power emitted from the front mirror of the laser (Pru) was 
measured as the drive current was increased. The output voltage of 
the p-i-n photodiode packaged with the laser to monitor the emission 
from the back mirror (Vgm) was recorded at discrete levels of Pru. This 
measurement provided sufficient data to infer Psy from the back- 
mirror signal, even in the presence of drift in threshold. Next, the laser 
was driven in its LED regime (Prm = 0.2 mW) and the front-mirror 
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Fig. 5—Diagram of the apparatus used to measure excess noise from the junction 
laser. 
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emission was focused through a neutral-density filter onto an APD. The 
APD output was amplified and measured with a spectrum analyzer. 
The spectrum analyzer measured the sum of the avalanche-multiplied 
noise from the laser light and the dark current from the APD and the 
thermal noise from the amplifier. The noise from the latter was at least 
15 dB below the other noise sources. Thus the amplifier thermal noise 
could be neglected in calculating the excess-noise ratios. The drive 
current to the laser was then increased, and, at discrete levels of Pr, 
the neutral-density filter was adjusted to attenuate the optical signal 
until the average photocurrent of the APD (iapp in Fig. 5) was the same 
as that when the laser was driven as an LED. The difference in noise 
power (expressed in dBm) read off the spectrum analyzer is the excess- 
noise ratio in decibels. 

The excess-noise ratios of two lasers, one with a linear and the other 
with a nonliner L—I characteristic, are shown in Fig. 6. The excess 
noise peaked near lasing threshold. A small secondary peak appeared 
near the power level where the L—I characteristic became nonlinear, 
similar to what has been observed by others.'” '® Many lasers selected 
for this experiment showed fairly linear L—J characteristics up to Pru 
= 4 mW, and no secondary noise peak was observed within this power 
level. For the 17 lasers measured, the excess-noise ratio decreased to 
less than 1.5 dB when the lasers were driven at Pry = 2 mW (the 
average power for all the packages). However, with a large modulation 
index (~0.8), the laser will be driven down toward threshold for a 
fraction of a modulation cycle, and more excess noise will be added to 
the system during this period. How this effect will influence the overall 
system performance is not clear at this time. 


V. BURN-IN RESULTS 


Nine lasers were selected for packaging based on their relatively low 
IM distortion, small excess-noise ratios, and absence of severe kinks in 
their L—I characterstics at both mirrors below Pru = 4 mW. 

All completed packages were mounted in a life-test rack and oper- 
ated for 300 h at room temperature (18 to 26°C). The photocurrents of 
the p-i-n diode in the tap, the p-i-n diode in the laser package moni- 
toring the back laser mirror, and a third p-i-n diode monitoring the 
actual fiber output were monitored. In addition, the drive and bias 
currents were recorded semiautomatically in a digital format during 
burn-in. The percent changes of the output power from the fiber 
(A Prier/Priver), of the back-mirror p-i-n voltages (A Vim/Vbm) and the 
bias current (Jz) are shown in Table I, together with the second- and 
the third-order intermodulation products and excess-noise ratios mea- 
sured at Prem = 2 mW. Among the nine packages completed, six showed 
less than 2-percent change in the bias current (Jz) during burn-in. Part 
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Fig. 6—Laser power (P,™) and the excess-noise ratio (S) vs drive current. 


of the change in Ig was due to variations in the ambient temperature 
and is not necessarily indicative of deterioration of the lasers. Conse- 
quently, they were used in the field experiment. 

One package (L—2) developed large changes in fiber output, back- 
mirror emission, and bias current, while the tap photocurrent was kept 
constant by the feedback circuit. The cause of this instability was 
probably mechanical instability in the tap. The remaining two pack- 
ages failed in less than seven hours for unknown reasons. 


1626 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER, 1979 


Table |—Results of 300 h burn-in at room temperature, and second- 
and third-order intermodulation products and excess noise ratios at 


Pem = 2 mW 
ee 2nd-order 3rd-order 
fiber inter- inter- Excess- 
paces Prner Bae ‘a modula- modula- noise ratio 
(+%) = tion tion (dB) 
(—dB) (—dB) 

L-1 2 5.3 82-84 38 26 1.2 
L-2 7.5 8.6 80-89 32 17 1.1 
L-3 1.4 2.6 97-101 31 40 0.8 
L-4 0.8 Ll 76-78 34 46 1.5 
L-5 1.4 10.2 86-87 29 34 1.3 
L-7 1.2 1.9 110-112 20 35 1.4 
L-9 1.8 2.1 83-85 16 30 0.1 


VI. SUMMARY AND DISCUSSION 


Lightwave transmitters capable of delivering frequency-modulated 
subcarrier IF signals (70 MHz + 20 MHz) were developed for use in 
satellite entrance links. Double-heterostructure GaAlAs injection las- 
ers with 12-um stripes were used. At 2-mW laser power, the excess- 
noise ratio from the lasers was not more than about 1.5 dB. Second- 
order IM products were 16 to 40 dB and third-order Im products were 
17 to 46 dB below the fundamental at a modulation index of 0.8. The 
average power coupled into the FT3 fiber (NA = 0.23 core dia = 55 pm) 
varied from 0.6 to 0.8 mW. 

The third-order IM products observed for many 12-ym stripe lasers 
were sufficiently low for possible applications in multi-carrier trans- 
mission. However, the magnitude of distortion may increase as the 
lasers age, since they tend to develop nonlinearities in their L-J 
characteristics. In principle, improvements in reducing the nonlinear 
distortion from direct modulation of injection lasers can be approached 
_from two directions: electronic compensation for the nonlinearity and/ 
or modification of the laser structure. Use of a predistorted signal 
derived from one laser and applied to a second laser’’ to minimize the 
distortion products has been successful with LEDs, but it may be 
difficult with injection lasers since two lasers cannot be expected to 
develop the same nonlinearity at the same time. The approach using 
improved laser structures has been more successful. Already various 
structures have been reported in the literature’®*°” showing ex- 
tremely linear characteristics. With these new, linear structures, ulti- 
mately the magnitude of the Im products and the useful bandwidth for 
analog modulation will be determined by the inherent resonance of 
these lasers, which is in the vicinity of 1 GHz. Due to these resonances, 
distortion would be expected to appear at subcarrier frequencies of 
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several hundred megahertz even for a laser with perfectly linear L-J, 
characteristics”’ as measured at dc (or low frequencies). 
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Adaptive Aperture Coding for 
Speech Waveforms—I 


By N. S. JAYANT and S. W. CHRISTENSEN 


(Manuscript received December 29, 1978) 


In aperture coding, one refrains from encoding waveform samples 
until the waveform crosses an appropriately wide aperture centered 
around the last encoded value. If the waveform is slowly varying in 
some sense, the above procedure can be a basis for bit rate reduction. 
The identification of aperture-crossing samples can be either explicit 
or implicit, and tt is the latter case that this paper mainly addresses. 
We follow a finite length, converging-aperture procedure proposed 
recently for picture waveforms, and show that it can be used for 
speech coding as well if the aperture width is designed to be syllabi- 
cally adaptive. We also describe, for Nyquist-sampled speech, desir- 
able designs for aperture shape and aperture length L. The special 
case of L = 1 corresponds to ternary delta modulation with a constant 
encoding rate of log2 3 ~ 1.6 bits/sample. Using longer apertures 
(e.g., L = 2, 3), we show that it is possible to obtain average encoding 
rates as low as 1.2 bits/sample without significantly changing output 
speech quality. With 8- to 12-kHz sampling , the average bit rate 
would then be 9.6 to 14.4 kb/s. At these transmission rates, adaptive 
aperture coding, used in conjunction with a simple (first-order) adap- 
tive predictor, can provide communications quality speech. 


I. INTRODUCTION 


The encoding technique described in this paper is intended to be a 
simple time-domain approach for encoding speech waveforms at trans- 
mission rates like 9.6 or 16 kb/s. The digital speech output resulting 
from this technique, or simple modifications thereof,° is expected to be 
of communications quality: less than toll quality, but nevertheless 
adequate for many applications. 

The notion of aperture coding, per se, is not new. It has been 
considered extensively for digitizing telemetry data, with a view to 
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exploiting their slowly changing characteristics.’* The point of this 
paper is that aperture coding can be useful for low-bit-rate digitizations 
of speech waveforms as well, provided the coding procedure is designed 
to be properly adaptive to the changing statistics of speech inputs. In 
fact, an important contribution of this paper is the specification of a 
rather carefully designed syllabic adaptation algorithm for aperture 
width. 

Adaptive aperture coding is inherently a variable rate procedure, 
and for use with a transmission channel that expects a constant-rate 
output, one would need an appropriate buffer at the coder output. 
Typical buffer lengths and consequent encoding delays can be several 
tens of milliseconds. This will be of no concern when aperture coding 
is used for digital speech storage but, for transmission applications, 
the encoding delay will be an important consideration. 


ll. APERTURE CODING 


The basic notion can be explained with reference to Fig. 1. Assume 
that the waveform sample at time 0 has been encoded and transmitted. 
The idea now is to view the immediate future of X(t) through an 
aperture of width 2A, centered on the circle that represents the 
transmitted value at time 0; and to refrain from transmitting samples 
that lie within this aperture; the next transmission will therefore occur 
at time 3, after which the process continues with an updated aperture. 
Here, and in the next figure, open circles represent transmitted values, 
while solid dots denote samples deemed redundant. In reconstructing 
the waveform, redundant samples can be assigned amplitudes equal 
(for example), to that of the last transmitted sample, as shown by the 
dashed horizontal running through the aperture. This procedure en- 










xt) Uf} 


RO 
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Fig. 1—Illustration of the aperture coding concept. 
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tails a distortion that can be referred to as aperture noise. As one 
increases A, aperture noise increases, but so does the proportion of 
samples that need not be encoded/transmitted. The tradeoff between 
noise and transmission probability depends on how slowly the input 
waveform varies, and for nonstationary inputs such as speech wave- 
forms, the best tradeoffs are realized in schemes where one adapts A 
to changing input statistics. [With nonadaptive aperture schemes for 
Nyquist-sampled speech, a transmission probability of 1 out of 2 (or 
2.5, 3, 4, 5) samples implies typical signal-to-aperture-noise ratios of 
about 33 (or 21, 18, 14, 11) dB, assuming that the only silences present 
in the speech input are naturally occurring microsilences, and not 
explicit pauses. ] 

Practical aperture schemes present two considerations which have 
not been introduced in Fig. 1. First, the “transmitted” samples have to 
be digitized somehow, so that the quality of reproduced speech will be 
characterized by this digitization—or quantization—noise, in addition 
to the aperture noise mentioned earlier. Second, the decoder at the 
receiving end has to know which of the input samples have been 
deemed redundant by the encoder, and which of them have been 
explicitly digitized. Most aperture coding literature’* assumes explicit 
transmission of the above “timing” information. For example, the 
encoder can transmit, for each input sample, a binary number which 
tells the decoder whether that sample is being encoded or deemed 
reundant. If the probability of a nonredundant sample is p and if such 
a sample is further encoded using B bits, the average transmission rate 
is [p-B + 1] bits/sample, where the term 1 is due to the constant 
timing information bit; and the savings, relative to a zero-aperture 
scheme, are [B(1 — p) — 1] bits/sample. This formula suggests that B 
has to be large enough (for a given p) so that the savings is positive in 
spite of the timing information. On the other hand, in low bit rate 
applications, values of p (that are compatible with a tolerable amount 
of aperture + quantization noise) may be such that the savings due to 
aperture coding are either insufficient or negative—unless, of course, 
the timing information overhead can be avoided altogether. An aper- 
ture scheme which does precisely this was described recently by 
Murakami, Tachibana, Fujishita, and Omura‘ in the context of picture 
coding, and the purpose of this paper is to describe our modification of 
that scheme for encoding speech waveforms with B = 1.2 to 1.6 bits/ 
sample, a range of bit rates which clearly cannot afford explicit 
transmission of timing information. 

Succeeding sections describe our findings concerning aperture char- 
acteristics that are desirable for low bit rate speech coding. These 
characteristics include aperture shape, aperture length (to be defined 
presently), and adaptation algorithms for aperture width A. 
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lil. APERTURE CODING WITHOUT EXPLICIT TRANSMISSION OF TIMING 
(TIME OF NONREDUNDANT SAMPLE) INFORMATION 


Consider the procedure of Fig. 2. The converging nature of the 
aperture is desirable, as we shall note later, but the convergence is not 
critical from the timing information viewpoint. As in Fig. 1, a trans- 
mitted sample at time 0 is followed by two redundant samples. The 
nonredundant sample X (3) is encoded as follows. First, it is quantized 
to a level corresponding to the previous nearest point on the aperture 
characteristic (P3, in this case), and this value is transmitted by means 
of a code word dedicated to point P3 of the characteristic. Reception 
of this code word conveys two items of information to the receiver: 
first, that a nonredundant sample was encountered after P3, i.e., at 
time 3; second, that this sample has been quantized to an amplitude 
that is equal to that of P3 itself (as defined relative to the dashed 
horizontal running in the middle of the aperture). Once again, as in 
Fig. 1, the process is repeated with the transmitter (and receiver) 
beginning a new aperture centered on Y (3), the approximation to X (3). 

In the above example, a (positive-sided) aperture crossing was 
observed at time ¢ = 3. (This event will be denoted when needed as a 
“run” of length R = 3.) If the crossing was observed at time ¢ = 1 on 
the other hand (run length R = 1), the input X(1) would have been 
encoded as a value Y(1), and this would have been represented by a 
code word (and amplitude) corresponding to P1 or N1, depending on 
whether the crossing was above or below the aperture center. If, on 
the other hand, there was no crossing even as late as ¢ = 3 (run length 
R > 3), X(3) would be encoded by the central “zero” level Z at the end 
of the aperture and a new aperture would be created, centered on Y’ 
= Z. 





Fig. 2—Aperture coding without explicit transmission of timing information. 
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Table I—Aperture characteristics for L = 3 and J = 0.5 


Run Length R 
(First time that Updating Relative 
crossing is observed) to Predicted Value 
1 +Ao 
2 +Ao/ V2 
3 >Ao/2 
3593” 
No crossing 0 
observed 


The use of a “zero” output level implies the use of a finite-length 
aperture. In fact, the aperture length can be defined as the time at 
which the aperture is truncated with a zero output level Z. In Fig. 2, 
L = 3, and in the particular example that has been sketched, R = 3 as 
well. 

The number of “output” points on the aperture characteristic of Fig. 
2 was 7 (3P’s + 8N’s + 1Z). In general, the aperture characteristics in 
our scheme are described by (2Z + 1) distinct outputs, and correspond- 
ing transmission code words. 

Relative amplitudes on the characteristic are determined by the 
shape of the aperture. We have found that converging apertures that 
are appropriate for speech can be conveniently formalized into the 
class 


A(t) = Ao-277", (1) 


where A(t) is the aperture width at time ¢t. We have further found that 
a desirable range for J is (0.5 = JS 1). (We have also looked at shapes 
described by complete convergence, A(L) = 0, with corresponding (2L 
+ 1)-point characteristics, but we have found them to be less useful 
than those described by the exponential decay above.) 

Table I defines the quantization characteristics of aperture coding 
for the illustrative case of L = 3 and J = 0.5. Notice that the output 
(quantized) amplitudes are defined relative to a “predicted value.” In 
the examples of Figs. 1 and 2 we have assumed that all predicted 
values are equal in amplitude to the last (explicitly) transmitted 
amplitude, as indicated by the dashed horizontals running through the 
aperture areas. The situation corresponds, formally, to a first-order 
predictor with a coefficient ai equal to unity.* In general, however, 
one can use a speech-specific predictor a; = 0.85, or a higher-order 
predictor (for example, a; = 1.10, az = —0.28, a3 = —0.08; see Ref. 5) to 
reconstruct redundant samples, and to predict nonredundant samples 
prior to updating, as in Table I. The coding procedures in these more 
general cases would be qualitatively described by Fig. 3. Further, the 
predictor can also be adaptive, to follow the changing spectral char- 


* Nonpredictive aperture coding results if a; = 0. 
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Fig. 3—Aperture coding with (a) first-order prediction: a; = 1, and (b) more general 
prediction. 


acteristics of input speech. The adaptive predictors considered in this 
paper are of first order, in the interest of simplicity: the adaptive 
predictor coefficient is simply set equal to the one-sample-lag autocor- 
relation value c; of the speech input. The parameter c; is updated once 
for every 256-sample input block. Explicit transmission of c; values to 
a receiver will typically entail an additional information transmission 
of about 100 to 200 bits per second. This extra transmission can be 
entirely avoided in schemes where ¢; is estimated from a past history 
of reconstructed, rather than input, speech.° 


IV. ADAPTIVE APERTURES 


Nonadaptive and adaptive apertures are sketched in Fig. 4. The 
figures show the time evolution, if any, of the maximum aperture width 
Ao{A(0)). For a nonstationary signal such as speech, it is critical to 
have an adaptive procedure such as in Fig. 4c. The adaptations would 
let Ao follow changing input statistics and provide individually tailored 
arrangements for encoding high-level voiced segments, low-level 
sounds such as fricatives, and zero-level microsilences. The results of 
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this paper assume that the aperture shape as described by J in (1) is 
fixed, and that only the width Ap is adaptive. 

We studied many adaptation algorithms, including those that can 
be described as instantaneous, periodic, and syllabic. The best results 
were obtained with syllabic adaptations as typified by the algorithm 


Aft) = G,- AM + G.- (ADAPT) 
G,=1-e: e—0 
4 
(ADAPT) =1 if ¥ RH-~<K 
s=1 


= 0 otherwise. (2) 


2AQ 2Ao (c) 


—— > TIME 
ADAPTIVE APERTURES 


Fig. 4—Nonadaptive and adaptive apertures. 
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In the above algorithm, r indexes successive new apertures, and not 
successive input samples. In other words, going from r to r + 1 could 
mean an interval up to L input samples. The parameter R refers to the 
run length of redundant samples. By defining a “Z” event as a run of 
length L + 1, the parameter R is seen to have a range (1<R<L +1). 
Briefly, the above adaptation logic uses a succession of four small runs 
as a cue for increasing Ao; in the absence of such a cue, the logic lets 
Ao decrease slowly, at a rate given by G;. Our experiments have shown 
that desirable values of G, for 8-kHz-sampled speech are between 0.95 
and 0.99 (corresponding to syllabic time constants of 2 to 8 ms for 
aperture decreases); while good choices for the threshold K are 6, 6, 
and 8 for aperture lengths of L = 1, 2, and 3 respectively. The most 
interesting parameter in (2) by far was the quantity G2 that determines 
the nature of aperture width increases, and we shall come back to this 
parameter presently. 

Meanwhile, Figs. 5, 6, and 7 provide illustrative descriptions of the 
adaptive procedure described in (2). Figure 5 shows how the aperture 
width Apo (5b) tracks input speech power (5a), Fig. 6 provides a typical 
histogram of Ao samples, showing a microsilence-related concentration 
at very low values of Ao, and Fig. 7 compares typical aperture-crossing 
sequences (redundant and nonredundant samples, versus time) in 
nonadaptive (7b) and adaptive (7c) schemes. In the 2-state sequences 
of Figs. 7b and 7c, a zero state represents a redundant sample, while a 
nonzero state denotes an aperture crossing, or nonredundant sample. 


ENVELOPE OF SPEECH AMPLITUDE 
MAX = 15487 
MIN =— 10653 





(a) 


ENVELOPE OF APERTURE WIDTH 


MAX = 4768 
" ‘ : MIN = 0 


1 20000 


(b) 


TIME (SAMPLE NUMBER) ——> 


Fig. 5—Syllabically adaptive apertures: envelopes of (a) sentence-length speech 
(b) aperture width Ao. 
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Fig. 6—Histogram of aperture width samples. 
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3585 uo 3840 
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3841 4096 
SAMPLE NUMBER—3> 





\ \ REDUNDANT SAMPLES 
% : 
\ NONREDUNDANT SAMPLES 





SAMPLE NUMBER—a> 
(c) 


Fig. 7—Aperture crossings in (b) nonadaptive and (c) adaptive schemes, correspond- 
ing to a (a) speech waveform segment. 
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V. PERFORMANCE OF APERTURE CODING AS A FUNCTION OF L AND Gz 


The more interesting results of our experiments (computer simula- 
tions) are summarized in Figs. 8 and 9. These results apply to a 
bandlimited (200 to 3200 Hz) female utterance, “THE CHAIRMAN CAST 
THREE VOTES,” and the nonadaptive third-order predictor [a; = 1.10, 
a2 = —0.28, az = —0.08] mentioned earlier. The case of adaptive 
prediction is discussed briefly in Section VI. 


5.1 Segmental signal-to-noise ratios 


The objective speech quality measure used in Fig. 8 is the segmental 
signal-to-noise ratio SNRSEG obtained by computing the s/n ratios in 
256-sample (32 ms) blocks, expressing the values in decibels, and taking 
the average of local decibel values over the length of the sentence- 
length input—a procedure which reflects low-level speech rendition 
better than the conventional average s/n ratio. It is significant that 
the maximum performance with L = 3 is nearly 1 dB below the peak 
performance with L = 1 and L = 2; and that for a given value of G», 
L = 2 tends to perform better than L = 1 (except if G2 < 25). It will be 
seen, on the other hand, that, transmission-rate-wise, interesting values 
of G2 are quite different for different values of L, and we presently 
reexamine the results of Fig. 8 taking the above fact into account. 


L: APERTURE LENGTH 


SNRSEG IN DECIBELS 





10 15 25 45 75 150 250 450 1000 
Gy (NOTE: 1x1 max = 13000) 


Fig. 8—Variation of speech quality SNRSEG with adaptation parameter G2, for L = 1, 
2, and 3 (8-kHz sampling; nonadaptive prediction). 
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oO 1 
L: APERTURE LENGTH (J 2 
A 3 
WITHOUT HUFFMAN CODING 
— —— WITH HUFFMAN CODING 


(NUMBERS ALONG CURVES 
ARE SNRSEG VALUES 
IN DECIBELS) 


BITS PER SAMPLE 





10 15 8«625 4578 150 250 450 1000 2000 
Gp (NOTE: 1x1 sya = 13000) 


Fig. 9—Variation of transmission rate (bits/sample) with adaptation parameter Gp, 
for L = 1, 2, and 3 (8-kHz sampling; nonadaptive prediction). 


Meanwhile, it should be noted that, for a given value of L, and in the 
neighborhood of an SNRSEG-maximizing G2 value, increasing G2 tends 
to make the output speech more granular and harsh; while decreasing 
G2 tends to make the speech more low-passy and muffled. Finally, the 
“design” parameter in Fig. 8 is strictly the ratio of Gz to |X |max, the 
maximum input speech magnitude, rather than the absolute value of 
Ge. 


5.2 Average transmission rates 


The average (information) transmission rate in an aperture coding 
scheme is upper-bounded in the form 


I(L) s p-loge(2L + 1) bits/sample, (3) 


where p is the probability of a nonredundant sample and (2L + 1) is 
the number of distinct output points on the aperture characteristic. 
The inequality above recognizes the fact that the (2L + 1) output 
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code points, in general, have unequal probabilities of being used, so 
(for a given p < 1) further information compression can be achieved 
by assigning relatively short code words to frequent outputs and using 
relatively long code words for the infrequently occurring outputs. 
Thus, in the example of Table II, the variable-length Huffman coding 
results in an average bit rate of 1.34 < 1-log.3 = 1.59 bits/nonredundant 
sample. Note that for L = 1, p = 1 by definition because the 3-point 
aperture characteristic always puts out a nonredundant output corre- 
sponding to P1, N1, or Z. In fact, this special case is no more than 
ternary (3-level) delta modulation, and the output has a constant rate 
of log23 ~ 1.6 bits/sample. The effect of Huffman coding, however, is 
to make the output bit rate variable. With L = 2 or 3, on the other 
hand, the output rate is variable even without Huffman coding, because 
p < 1im general, for these cases. Information rates for L = 1, 2, and 3 
are shown in Fig. 9 as a function of G2, with and without entropy 
(Huffman) coding in each case. 

Figure 9 also includes, for convenience, the SNRSEG information from 
Fig. 8. For an average bit rate of 1.6 bits/sample, ternary delta 
modulation (L = 1) without Huffman coding is an obvious choice: 
there is no motivation for aperture coding and the attendant variability 
in the encoder output rate. For average bit rates of about 1.4 bits/ 
sample, one has the choice: E = 1 with Huffman coding or L = 2 
without Huffman coding. It is apparent that, for the greatest reductions 
of information rate (say, J(L) = 1.2 bits/sample), one needs to employ 
nontrivial (L > 1) aperture coding, an observation that is also suggested 
by the literature on adaptive asynchronous delta modulation.’ In our 
scheme, the justification for L = 3 comes directly from the fact that 
values of G2 that realize 1.2 bits/sample encoding are far too subopti- 
mal (SNRSEG-wise) in the cases of L = 1 and L = 2 (see Fig. 8). 


Vi. ADAPTIVE PREDICTION 


We have studied the performance of an adaptive aperture coding 
scheme where the waveform predictor is also adaptive. In the interest 
of simplicity, we have confined our studies to the case of first-order 
prediction. In this case, the adaptive prediction procedure is simply to 
compute the adjacent sample correlation c, of input speech samples X 


Table Il—Huffman coding example (L = 1, Ge = 45) 


Sign Run Length Probability Code Word 
+ 1 0.17 00 
- 1 0.17 01 
>I 0.66 1 


Transmission Rate 0.17-(2) + 0.17-(2) + 0.66- (1) 


1.34 bits/sample 
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BLOCK LENGTH: 256 
APERTURE LENGTH: 3 


PROBABILITY 





60 80 100 120 140 160 180 
NUMBER OF TRANSMISSIONS (NTRANS) 
Fig. 10—Histogram of the number of transmissions in a 256-sample block (L = 3). 


(typically, once for each input block of 256 samples), and to set the 
predictor coefficient a; equal to c: 

E{x,x,- 

The use of adaptive prediction did not increase the SNRSEG values 
of Fig. 9 drastically, but perceptual improvements in coded speech 
quality were quite significant. The resulting speech output with 1.2 to 
1.6 bits/sample aperture coding has communications quality: the 
degradation is obvious in a direct comparison with the input speech, 
but the quality should nevertheless be adequate for many communi- 
cation purposes. The output speech quality also varies with input 
speech: with certain types of input, the output speech is highly intel- 
ligible even with nonadaptive prediction and 8-kHz sampling. The 
speech quality, however, improves significantly with adaptive predic- 
tion and faster sampling (say, 12kHz), and with adaptive low-pass 
filtering of the output.® Finally, in informal comparisons with adaptive 
delta modulation (ADM) at a given bit rate, adaptive aperture coding 
is clearly better, as expected. 


E(-): expected value. (4) 


Vil. VARIABILITY OF BIT RATE IN THE APERTURE CODER OUTPUT 


Aperture coding schemes, with the exception of the special case of 
ternary delta modulation without Huffman coding, generate variable- 
rate outputs. For example, Fig. 10 shows a histogram of sample values 
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of NTRANS, the number of transmissions per 256-sample block, in a 
scheme with L = 3. Note that the nonredundant sample probability p 
varies in the range 0.31 < p S 0.66. 

If a variable-rate procedure is used to decrease the average bit rate, 
one needs the additional provision of a bit buffer to be able to deliver 
bits at a constant rate into a channel (that accepts them in that 
format). The necessary length of such a buffer can be equated to the 
peak-to-peak variation of the quantity 

N 


B. — B.N, (5) 
u=1 
where B, is the number of bits used to encode speech sample u(B = 0), 
N is the total number of speech samples in a (statistically long enough) 
test input, and B is the average bit rate (bits/sample) needed to 
transmit that test input. 

Using the sentence-length utterance mentioned earlier, we evaluated 
the peak-to-peak excursion of (5) for three cases: (i) L = 1 plus 
Huffman coding, G2 = 35; (iz) L = 2 without Huffman coding, G2 = 35; 
and (ut) L = 3 with Huffman coding, Gz = 45. Cases (1) and (it) 
correspond to B = 1.4, and case (iii) is B = 1.2. Respective buffer 
requirements were approximately 600, 400, and 800 bits. Respective 
encoding delays (for 8-kHz sampled speech and appropriate B values) 
are approximately 50, 35, and 80 ms. 

For speech transmission applications, the above delays are signifi- 
cant if not prohibitive. Furthermore, in practical designs of aperture 
coding, one should specify a maximum buffer length, and they should 
include an automatic procedure’ for increasing or decreasing the local 
average rate B, depending on current buffer status as given by (5). 
Clearly, the parameter G2 would be a natural means for controlling 
local values of B. 

In multiplex-speech situations, active (high output bit-rate) and 
inactive (low output bit-rate) segments get more intermixed in time 
than with a single speech channel. Consequently, buffering problems 
are expected to be less severe with multiplex-speech inputs. In fact, 
there is at least one “digital TASI” application, SPEC (Speech Predictive 
Encoded Communications), which indeed employs a simple form of 
aperture coding.” 

The most straightforward application of aperture coding will perhaps 
be in the context of speech storage. In storage applications, encoding 
delays are less objectionable than in transmission, and buffer overflow 
problems, if any, need not be combatted in real time. 
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Statistical Block Protection Coding 
for DPCM-Encoded Speech 


By R. STEELE, N. S. JAYANT, and C. E. SCHMIDT 
(Manuscript received January 3, 1979) 


Blocks of speech-carrying DPcM bits are protected from transmis- 
sion errors by means of explicit communication of two block statis- 
tics—the maximum and the root-mean-square (rms) values of the 
adjacent-sample differences in the DPCM-quantized speech. At the 
receiver, the maximum value is used as a cue for error-detection, 
while the rms value is used for a partial waveform correction proce- 
dure that provides intelligible speech at bit error rates as high as 10 
percent. 


I. INTRODUCTION 


Block protection coding, whereby a block of data words is protected 
by the addition of special code words or letters, is a common feature 
in communication systems for noisy channels. In algebraic error detec- 
tion and correction, for example, the protection is derived from parity 
check bits. The number of parity checks, and hence the redundancy, 
increases with the number of data bits protected, but the resulting 
error-coding procedures are quite general, being applicable to any type 
of data, irrespective of its source. Nevertheless, with sources such as 
speech, where it is not crucial to recover every speech-carrying bit 
without error, it is meaningful to look for certain special, compact 
forms of non-algebraic block protection. The idea is to transmit a 
protection word that identifies some perceptually significant parameter 
of a speech-waveform segment; knowing the (correct) value of this 
parameter, the receiver can perform error-detecting and error-correct- 
ing operations, which may be only partial in an algebraic sense (due to 
the compactness of the protecting procedure) but nevertheless quite 
adequate from a speech-perception viewpoint. 

In one recent investigation’ along these lines, each block of differ- 
ential pcm words was protected by a reference PcmM word that signified 
the speech amplitude at the end of the block. Error detection was 
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based on comparing the Dpcm decoder amplitude at the end of the 
block with Pcm reference. Procedures for locating (and correcting) 
errors within the block were simple for single errors, but fairly involved 
for multiple errors in a block. A very successful error-location proce- 
dure had however been noted in an earlier investigation;” this depended 
on the detection of a statistically unlikely change between adjacent 
samples in the corrupted speech signal, relative to the root-mean- 
square (rms) value of these differences measured over a suitably long 
block containing these samples. The rms parameter in Ref. 2 was 
obtained from the corrupted speech, and this affected the success of 
the procedure at high error rates (say, 5 percent or higher). 

The scheme to be described in this paper recognizes and extends the 
statistical notions of Ref. 2 and incorporates them in a block protection 
system that is effective even at error rates as high as 10 percent. This 
statistical block protection coding (SBPC) system is discussed for the 
specific case of non-adaptive DpcM, but extension to an adaptive 
system should be possible, at least in cases where the (step size) 
adaptation is slow or syllabic.* 


il. STATISTICAL BLOCK PROTECTION CODING (SBPC) 


The sBPc system employs a simple protection code consisting of two 
words which represent: 

(t) The maximum difference between adjacent locally decoded 
speech samples within the block of W samples. 

(it) The rms value of the differences between adjacent locally de- 
coded speech samples within the block of W samples. Notice that the 
extremal statistic (z), together with the central statistic (iz), constitute 
a partial description of the ppF (probability density function) of first 
differences. 


2.1 Transmitter 


The arrangement of the pPcM encoder and the system for generating 
the protection code are shown in Fig. 1. Suppose that the mth block 
of speech samples is being processed. The input speech sample xnw+,, 
corresponding to the rth instant in the mth block, is encoded into a 
quantized sample gnw+,r by a DPCM encoder using a uniform quantizer. 
The predictor is of first order, with a coefficient value of LK < 1. Z7! 
represents a delay of one sample period. 

Denoting the locally decoded speech sample by ynw+,, the protection 
code words are defined in the form 


* Recent studies have shown that our technique works quite well in conjunction with 
an adaptive procedure where the quantizer step size is constant within a block (several 
milliseconds or tens of milliseconds long) of samples, but is modified once at the 
beginning of each block in response to changing speech level. 
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Cenak = Max | Ymw+r — YmW+r-1 |e2 (1) 
2<r<W 


1 w 1/2 
dims = an 2 (Ymwe+r = Yates . (2) 


The quantizer Q2 has the same number of levels as the DPCM 
quantizer Q1, but is arranged to quantize only positive samples. Thus, 
after multiplexing dmax, drms and W DPCM words, the frame consists of 
(W + 2) n-bit words. It is important, or at least very desirable, to 
protect the “protecting words,” dmax and dims, by transmitting them in 
a redundant format. For example, one might transmit three versions 
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of each bit in the protecting word and decode the bit on the basis of a 
majority count. The overhead constituted by this protection arrange- 
ment would be 3 X 2 = 6 words, or 2.3 percent if W = 256; this 
overhead is much smaller than the redundancy required of an algebraic 
code that would correct some patterns we will discuss later in this 
paper. 

The ppcm-encoded speech together with its simple protection code 
is transmitted through a channel which may cause some bits to be 
inverted. The probability of bit inversion is called the error rate ER. 


2.2 Receiver 


The receiver demultiplexes each frame into its data block and 
protection code. The DPCM sequence is decoded into Ynw+s; k = 1, 2, 
, W. (Note that cap letters Y and D will be used to signify variables 

at the receiver.) Figure 2 shows the essential features of the SBPC 
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Fig. 2—sBpc decoder. 
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Fig. 3—Waveform correction at time r. 


correction procedure. For simplicity, the demultiplexer decoders and 
control facilities have been omitted. 

We suppose that samples ----, Yuwsr-3, Yawsr-2, Yuwesr-1, have 
been either considered correct and passed to the output, or deemed to 
be in error and partially corrected; hence the superscripts C. We now 
test sample Ynw+,r. We find the quantized magnitude difference Dnw+, 
between Ynw+r and Y%.,-1, and compare the difference with the 
maximum transmitted difference dmax. Ynw+r must be erroneous if 


Dmnwe-r > 7 ae (3) 


If inequality (3) is satisfied, the correction must be switched into the 
circuit and the erroneous Yw+r replaced by a corrected value YSwsr- 
The corrections are described by the algorithm (Fig. 3): 


Viewer = Yuwser-i t+ Amwer, (4) 
where 
Amwer = Ams Sgn (YSwsr-1 — Y&wsr-2) (5) 
if [sgn( Ynw+r1 — Ymw+r2) = sen( Ynws+r+1 — Ymwer)] 
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= 0 otherwise. (6) 


Clearly, this correction is based on a “smooth” output waveform 
model where the sign of the slope at time r equals that at time r — 1 
if the latter equals that at time r + 1 [eq. (5)], while, if the slopes at 
times r — 1 and r + 1 are opposite in sign, that at time r is given the 
average value of zero [eq. (6)]. Furthermore, in the former case, the 
magnitude of the slope at time r is set to the block-specific rms value 
dims. Strictly speaking, the optimum setting of this magnitude would 
take the form J-dims, where J would be a constant depending on the 
shape of the first-difference PDF. 

The correction algorithm has also been deployed® with dims being 
derived from the corrupted speech. With large values of the error rate 
ER, this would give rise to poor corrections. By explicitly transmitting 
the value of dims, the corrections are significantly improved. 


2.3 Updating samples following a correction 


Having made the correction to sample Y,,ws-, we remove the error 
from the subsequent samples before we continue testing the next 
sample Y,,w+r+1 (Figs. 3 and 4). This is done as follows. Let 


DIF nwe-r = mW-+r — Y ower. (7) 


As the propagation of the error is due to the integrator, and the 
integrator leakage factor is LK, the subsequent decoded samples are 
reduced to 
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Fig. 4—Error detection, correction and sample updating. 
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Ynwenn = *aween > (LK)"DIF nw-+r 


n=1,2,---,-W-r. (8) 
The value of the error at the end of the block is 
Eum+1yw = (LK) WD IF wwer; (9) 


and this is stored. 

For each correction, the propagation of the error is removed leaving 
a residue at the end of the block whose size depends on the position of 
the sample corrected, as shown by eq. (9). These residuals are summed 
to give the total residual propagation error E{7.1)w. When the next 
block of ppcmM samples are processed, each one is modified to remove 
the propagation effects from errors in the previous block: 


Ywon+i+r = Y wom+1)+r =, (LK)’ EO uyw 
r=1,2,.---, W. (10) 


The detection and correction method epitomized by eqs. (3) and (8) 
are again used to process the DPCM samples in eq. (10). 


2.4 Organization of the transmission block 


The first part of the transmission frame contains the protection 
code. It is placed there in order for the detection and correction process 
to begin immediately. When testing Y,,w+1, samples Y,»w and Ymw-1 
from the previous block must be available. When testing the last 
sample, Y¢m+1)w, the first sample Y(n+1)w+1 from the next data block is 
required. Consequently, the total delay of the decoded speech is 
(W + 1) sampling intervals. 

The larger the value of W, the smaller the fractional increase in 
required channel capacity (due to the protection-word overhead), but 
the longer the decoding delay at the receiver output. 


lil. RESULTS AND DISCUSSION 


The block protection scheme was simulated on a Data General 
Kclipse computer. The band-limited input signal, a single sentence 
spoken by a male, was sampled at 8 kHz prior to encoding by a uniform 
7-bit DPcM encoder with predictor coefficient LK = 0.9. The coding of 
the quantizer output levels was such that an error in the most signifi- 
cant bit caused an error in the received sample equal to half the range 
of the quantizer. 

The pPcM code words were assembled into blocks of W words with 
the protection code previously described. The DpcmM code words were 
subjected to random errors, but the protection code words were left 
uncorrupted. 
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As a supplement to listening tests, the segmental signal-to-noise 
ratio,‘ 


SNRSEG = Average [short-time SNR(in dB) ], (11) 


was used, an objective performance criterion. The short-time SNR in 
(11) is a statistic computed over an interval typically 16 to 32 ms long. 
By performing the decibel operation prior to long-time averaging, the 
SNRSEG measure preserves information about how well the low-level 
segments of speech are reproduced. 

Figure 5 shows the variation of SNR as a function of amplitude 
scaling AS of the imput speech signal. From the zero error rate curve, 
it can be seen that optimum loading occurs for AS = 0.04. When ER 
= 4,2 percent, the decoded signal is very corrupted and SNRSEG is 
reduced by 40 dB in the underloaded condition. However, the sBPc 
system dramatically improves the performance of the DPcM system, 
increasing the SNRSEG by 11 dB for AS = 0.04, and by 19 dB for AS 
= 0.01. 

The unusual characteristic of the SBPC system is that, with large 
values of ER, the variation of SNRSEG is substantially independent of 
AS. This is a property found in adaptive ppcm. The reason for the 
nearly flat SNR characteristic is: In the presence of low level speech, 
dmax 1s a low number, and if many errors occur there will be numerous 
occasions when the differences between adjacent samples in the cor- 
rupted decoded signal exceed dmax. These erroneous differences are 
identified and will be partially corrected. Only those errors which 
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Fig. 5—SBPc gain as a function of input speech level AS. 
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Fig. 6—sBPc gain as a function of block length W. 
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Fig. 7—sBPc gain as a function of error rate ER. 


result in differences less than dmax will be missed. However, when the 
coder is occasionally experiencing some overloading (AS > 0.04, say), 
the maximum value dmax in some blocks will merely reflect quantizer 
saturation, rather than providing a cue for detecting transmission 
errors, and improvements are now gained only in purely unvoiced or 
silent intervals in speech. 
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Fig. 8—Waveforms of (a) original, (b) corrupted, and (c) corrected speech (ER = 10%). 


At AS = 0.01 and ER = 4.2 percent, the corrupted speech is 
perceptually very poor, and sounds almost like bandlimited white 
noise. By using the SBPc system, the speech is rendered intelligible, 
although of poor quality. The overall perceptual improvement is 
dramatic. 

The variation of SNRSEG as a function of block size W is shown in 
Fig. 6. Increasing W from 32 to 256 results in a decrease in SNRSEG of 
less than 2 dB. The near-independence of SNR from W is perhaps 
related to the fact that none of the W values used is large enough to 
encompass a significantly nonstationary segment of speech. 
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The gain is SNRSEG as a function of ER is shown in Fig. 7 for AS = 
0.01, W = 256. The objective gain is very slight for low error rates (say, 
ER < 0.1 percent), but significant for high error rates (say, ER > 0.5 
percent). In particular, the gains are quite dramatic with ER = 10 
percent. These objective gains are well reflected by the perceptual 
gains noticed in informal listening tests, and by the illustrative speech 
waveforms in Fig. 8. 
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An ADPCM Approach to Reduce 
the Bit Rate of n.-Law Encoded Speech 


By H. W. ADELMANN, Y. C. CHING, and B. GOTZ 
(Manuscript received March 14, 1979) 


A new ROM-orliented ADPCM architecture for processing p-law en- 
coded speech is presented. The architecture is shown to offer: 
(i) simple hardware realization, (ti) flexibility of algorithm modifi- 
cation, (iu) fixed hardware complexity with respect to numerous 
nonlinear processings, and (iv) excellent time-sharing capability. 
Performance measurements of a hardware implementation are also 
included. 


1. INTRODUCTION 


Bit rate reduction from the present 64 kb/s per trunk is of interest 
both to reduce per-trunk cost of existing and planned digital transmis- 
sion facilities and to prove-in digital transmission. 

Adaptive differential pcm (ADPCM) coding of the type reported in 
Refs. 1 through 4 appears promising for bit rate reduction of speech 
signals. Our specific interest is the bit rate reduction of the conven- 
tional 1-255, 64-kb/s signal. The purpose of this paper is to present an 
ADPCM architecture that offers: (1) simple hardware realization, 
(iz) flexible algorithm modification, (iii) fixed hardware complexity 
with respect to numerous nonlinear processings, and (iv) time-sharing 
capability over many trunks. The performance of a hardware realiza- 
tion of such a coder, which is shared by 24 voice trunks, is also 
reported. 


If. THE PROPOSED ADPCM ARCHITECTURE 
2.1 Basic structure 


DPCM with adaptive backward quantization and fixed one-tap pre- 
dictor, as proposed and analyzed in Refs. 1, 2, and 3, is our point of 
departure. Figure 1 is the conventional block diagram of an ADPCM 
codec and its interface to a p-law environment. The quantizer with 
scale (e.g., uniform step size) A is denoted by Qa, while the “inverse 
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quantizer” which maps a quantization interval code JQ into a quantized 
signal level is denoted by Q;’. The quantization scale A(i) is deter- 
mined by a scale adaptation algorithm from the quantization index 
sequence J@(z)*, which is sent over the channel. We add a prime to 
the decoder variables. 

Prior research on ADPCM assumes a fixed linear signal representation. 
In the case of the y-law signal environment, this requires p-to-linear 
(u/L) and L/p conversions, as shown in Fig. 1. The 14 bits per sample 
required by the linear representation is an indication of the complexity 
of the straightforward approach. 

Our basic approach is to put as much of the ADPcM algorithm as is 
presently practical into read-only memory (ROM). This allows us to 
confine the 14-bit linear representation to the Rom table computation 
(i.e., to the firmware). We motivate our architecture by relating it to 
the more familiar architecture of Fig. 1. For this purpose, assume a 
uniform B-bit quantizer which can be considered to be a cascade of an 
infinite amplitude quantizer and a B-bit limiter. The argument to 
follow could be generalized to a nonuniform quantizer. (Note that the 
limiting is on the quantizer output code as opposed to input amplitude.) 
Retaining the limiter in its place but shifting the quantizer and inverse 
quantizer past the difference and summing nodes} produces Fig. 2 
from Fig. 1. (For convenience, only a local decoder is shown together 
with the new notation.) To complete the transition to the proposed 
architecture as shown in Fig. 3, we first combine p/L and Q, into one 
block for 1/APCM conversion via a ROM realization. We next note that, 
instead of X,, we could store any other absolute representation of the 
local decoder estimate; in particular, we use the p-law representation. 
Combining Qj’ and L/y into an apcM/p block and combining the p/L, 
predictor multiplication and Q, into a (1/APCM), block, we obtain the 
proposed architecture shown in Fig. 3 where p/APCM, APCM/p, 
(u/APCM),4, and SCALE ADAPTATION are to be implemented with ROMs. 

The effects of these actions are discussed below. By moving the 
adaptive quantizers through the difference node, we can combine many 
nonlinear functions, which are difficult to realize in combinatorial 
logic, into RoM tables. The generation of these tables can be achieved 
with an off-line computer, where the 14-bit precision inherent in the 
p-law code is used. In addition, the outputs of these ROMS can be 
restricted to 8-bit APCM words that can be easily manipulated. We next 
examine in more detail our proposed quantization and scale adaptation 
scheme. 


* 1Qe{+0, +1, ---, +27°' — 1}, where B is the number of bits/sample sent over the 
channel. 
+ Note that Q(x + y) = Q(x) + Q(y) to within one quantizer interval. 
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Fig. 1—Conventional aprcm structure and interface to p-law environment. 





SCALE 
ADAPTATION 
10 


TO CHANNEL 






LOCAL 
DECODER } 
OUTPUT X,, 


Fig. 2—Relating a conventional to a proposed ADPCM structure. 


2.2 Quantization and scale adaptation details 


The Roo architecture of Fig. 3 permits a very general interpretation 
of scale as scale index d and of Qa and Q7z’ as corresponding arbitrary 
quantizers. In this general framework, the adaptation rule of Refs. 1, 
2, and 3 evolves to the following heuristic scale index adaptation and 
corresponding quantization constraint: For large (small) magnitude 
prediction error, the index d is increased (decreased) and the corre- 
sponding quantization Qa and Q7’ is made coarser (finer). The quan- 
tization we propose is a marriage of uniform and p-255 quantization; 
we call it semiuniform quantization. The semiuniform quantization 
and corresponding adaptation strategy is best explained by starting 
with uniform quantization where scale is synonymous to step size of 
the uniform quantizer. 

Denoting the uniform quantizer step by A, the adaptation algorithm 
is 

A(i + 1) = A(z) M (a), (1) 
where M(i) 4 M(|IQ(i) |) & Mjrquiy| are positive and have the ordering 
Mo =M, S, eee, = M,z-1_,, 

where 


Mo < 1 and M,p-1, > 1. (2) 
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Fig. 3—The proposed ADPCM architecture. 
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In Refs. 2 and 5, optimized adaptation vectors 


M= (Mo, M,, -++, M,z-1_,) 


are proposed and related to quantizer input variance estimation. It will 
be convenient to work with the log transformed version of (1) 


d(it+1)=d(t)+mi(i), . (3) 


where A = Aoq“ and M = q” for some convenient fixed Ao and base q. 
The scale adaptation can thus be done with the simpler iteration (3), 
while the translation from d to A (d/A) is done in the ROM generating 
software, as indicated in Fig. 3. We next discuss the finite precision 
aspects of a digital realization and the associated ROM size implications. 

Consider first the scale adaptation algorithm. We assume d to be a 
K-bit nonnegative integer, 0 = d < 2* — 1, representing 2“ permissible 
scales. Thus, Amin = Ao and Amax = Aog®~. To permit the ADPCM 
coder to approximate the idle channel performance of the p-law coder, 
we let the minimum step size Ao equal the first chord p-law step size 
which, for the 14-bit linear representation we use, corresponds to Ao 
= 2. To accommodate the 40- to 50-dB dynamic range of a speech 
signal, it is desired that 


Nevis 


=, Aa) 4 
Ae | (4) 


be between 100 and 300. The step-size resolution factor g should not 
be greater than 2, which is the p-law resolution; nor is much to be 
gained by making q too close to 1. A useful design range is ge{[2'*, 2] 
and Ke[3, 5] with a smaller g requiring the larger K. Figure 3 assumes 
K = 5, although a smaller value might be sufficient. Figure 4 is a block 
diagram for the scale adaptation. We show a scale adaptation prepro- 
cessor where the adaptation increment index JA is. based on the 
quantizer output past history such as min-max, averaging, or other.* 
For concreteness, we assume JA to be the three most significant 
magnitude bits of JQ. We have introduced F additional fractional bits 
for the internal scale index which we designate by d;. The scale 
adaptation iteration is 


0 if d(i)+m(i) <0 
d;(i+1) = 3jd(t) +m(t) otherwise (5) 
ines if d,(t) + m(t) > Cveeses 


* In the hardware realization we have found useful a two-word memory preprocessor 
where the most significant previous |/Q| bit in addition to the two current most 
significant | 7Q| bits determines the current JA. 
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Fig. 4—Scale adaptation block diagram. 


where d; and m(t) have K-integer plus F-fractional-bit representation 
and dinax = log,(16,000/27"') (where the upper bound on dmax corre- 
sponds to the step size necessary to accommodate the largest possible 
prediction error (16,000) without overload).* The F fractional bits are 
useful for two reasons. First, they permit adaptation design flexibility 
even with gq = 2; for example, with gq = 2 and F = 0, the largest step- 
reducing multiplier (closest to l) is 2~' = 0.5 while with F = 3 the 
largest step-reducing multiplier is 2-°'” ~ 0.9. Second, they are useful 
in the control of adaptation mistrack due to channel errors.®’ Figure 
4 assumes K = 5, F = 3, and up to 8 distinct adaptation multipliers 
implying a 2'' X 8 ROM. 

Consider next the quantization. For strictly uniform quantization, 
large amplitude, slowly varying ADPCM input signals would generate 
small step sizes and large magnitudes for Xapcm, Xapcm, and Xpapem. 
These large magnitudes might require a large number of bits to avoid 
amplitude overload. The amplitude overload problem as a function of 
the number of bits/word (=14) must be assessed. We propose to avoid 
the amplitude overload problem with the following semiuniform quan- 
tization scheme. For each scale index d, there is a threshold value T(d) 
such that for X, < T(d) the quantization is uniform, while for X, > 
T(d) the quantization is a copy of the p-law quantizer as illustrated in 
Fig. 5. The threshold T(d) is located at approximately a p-law segment 
boundary such that, for the segments below T(d), the segment step 
sizes are < A = Ang“ while, for the segments above T(d), the segment 
step sizes are > A. Thus wasteful quantizer level assignments (and 
therefore bits) are avoided when the final p-law quantization would 
ignore them anyway. Figure 6 shows the details of uniform and 
semiuniform quantization assuming A = 3. The important feature of 
the semiuniform quantization is that the number of quantizing inter- 


* For example, for g = V2 and B = 4, dinax = 22 and therefore K = 4 with dinax = 15 
or K = 5 with dmaxe[16, 22] should be satisfactory. 
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Fig. 6—Illustrating uniform and semiuniform quantization. 
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vals is <256 and therefore 8 bits is sufficient for the representation of 
Xarpcm, Xapcm, and Xp,apcm. The blocks p/APCM, APCM/p, and 
(u/APCM)a, including a sign magnitude/2’s-complement or inverse-con- 
version can each be realized with a 2° x 8 RoM, or alternatively one 
can use 2” X 7 ROMs with the sign bit feeding around the RoMs to 
separate sign magnitude/2’s-complement converters. 

Although the optimizations reported in Refs. 2, 3, and 5 do not 
quantitatively apply to our ADPCM coder, the qualitative concepts 
developed there with respect to loading, adaptation speed, and others 
are very useful in tuning our coder, and they could be utilized in our 
hardware realization. 

The next section briefly discusses two approaches to dealing with 
transmitter-receiver scale mistrack due to channel errors. The discus- 
sion serves to illustrate the versatility of our ROM architecture. 


Ill. CHANNEL ERROR AND SCALE MISTRACK 


In a backward adapting quantizer, the possibility of scale mistrack 
due to channel errors must be considered. A robust quantizer is 
obtained by modifying (3) [and similarly modifying (5)] to 


d(t+ 1) = Bd(i) + m(z), (6) 


where £ is some number less than but close to 1. The modification in 
(6) is incorporated into our architecture by a change of the scale 
adaptation ROM. References 6 and 7 discuss in detail performance 
aspects and design guidelines related to the finite arithmetic imple- 
mentation of (6). Also, the improved quantizer reconstruction strategy 
recommended for the robust quantizer in Ref. 8 can be implemented 
by a change in the APCM/p reconstruction table, i.e., the APCM/p ROM. 
These improvements require no additional hardware. 
An alternate approach to scale mistrack control is to modify (3) to 


d;(t + 1) = d;(t) + m(i, di(i)), (7a) 


where 


m(i)-—2°" D2 <d,(i) 
m(i, di(t)) = ) m(z) D, = d,(t) < Dz (7b) 
m(i) +2" di) < Du, 


m = log, M isan optimized adaptation vector in the absence of channel 
errors, and D; and Dz (e.g., Di = (1/3) dmnax and Dz = (2/3)dmax) are 
design constants which, for the reason to follow, we call mistrack 
correction levels. Each iteration that d;(i) and d7(i) are on opposite 
sides of n (n = 1 or 2) of the correction levels, the mistrack | d;(i) — 
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Fig. 7—ADPCM coder performance measurements. 


di(i)| is reduced by n2-". For* ot <J|mmin| the approach in (7) 
should avoid the error-free dynamic range penalty®’ associated with 
(6). Again, the important point is that (7) (or a generalization of it to 
an arbitrary number of correcting levels) can be incorporated into the 
codec by a simple Rom table modification without hardware or com- 
plexity penalty. 


IV. HARDWARE REALIZATION AND PERFORMANCE 


We have constructed an ADPCM coder which interfaces directly with 
a D3 bank. Because our coder uses parallel processing and the ROMs 
are relatively fast, we can share the coder among all 24 voice trunks of 
the channel bank. In fact, over 120 trunks could be accommodated if 
we so desire. The coder is implemented with three wire-wrapped 
circuit packs: one each for the interface, the transmitter, and the 
receiver. 

For the signal-to-noise measurements, we chose a prediction coeffi- 
cient A = 0.8 and a scale adaptation vector of (—1, 0, 0, 0, 1, 2, 3, 5) 
with a base q = V2. Because of the flexibility of the coder, we were 
able to vary the number of bits per sample feeding the transmission 
channel as well as the local decoder. The measurement arrangement 
is shown in Fig. 7. The signal-to-noise versus frequency for the coder 
is shown by the graph of Fig. 8. The measurements were made with 
the coder operating in the 5-bit ADPCM mode and with 5 bits in the 


* If it is desired to have a small m such as m = +2-* or m = 0, then (7b) can be simply 
modified to exclude the perturbation 2-" for those m. The RoM modification is again 
straightforward. Also, interleaving as in Ref. 6 should reduce the required F. 
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feedback circuit. The analog input was —15 dBm0. As expected, the 
coder performance drops with increased frequency. Above 1500 Hz 
when the adjacent sample correlation is below 0.4, the differential 
mode of coding has no advantage and in fact has penalty. The remain- 
der of the performance tests were made by measuring the signal-to- 
noise ratio versus amplitude for 1005 Hz. These measurements were 
made for the following coder options: 

(1) ADPCM with the number of transmit bits equal to 5 and the 
number of feedback bits equal to 5, 4, and 3. 

(tt) ADPCM with the number of transmit and feedback bits both 
equal to 4 and 3. 

Figure 9 demonstrates the performance insensitivity as the number 
of feedback bits is dropped. Figure 10 shows the performance for 3 and 


40 
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5—BIT FEEDBACK 

INPUT = —15dBm0 
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Fig. 8—ApDPcM coder performance, frequency response. 
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Fig. 9—ADPCM coder performance, 5-bit. 
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Fig. 10—ApDpPcM coder performance, 4- and 3-bit. 
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Fig. 11—Laboratory measurement vs simulation. 


4 bits. Experimental and simulation results for 1004 Hz coding with 4- 
bit ADPCM are compared in Fig. 11. 
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Transform Domain Motion Estimation 


By J. A. STULLER and A. N. NETRAVALI 
(Manuscript received March 21, 1979) 


This paper introduces an algorithm for estimating the displace- 
ment of moving objects in a television scene from spatial transform 
coefficients of successive frames. The algorithm works recursively in 
such a way that the displacement estimates are updated from coefft- 
client to coefficient. A promising application of this algorithm is in 
motion-compensated interframe hybrid transform-DPCM image cod- 
ing. We give a statistical analysis of the transform domain displace- 
ment estimation algorithm and prove its convergence under certain 
realistic conditions. An analytical derivation is presented that gives 
sufficient conditions for the rate of convergence of the algorithm to be 
independent of the transform type. This result is supported by a 
number of simulation examples using Hadamard, Haar, and Slant 
transforms. We also describe an extension of the algorithm that 
adaptively updates displacement estimation according to the local 
features of the moving objects. Simulation results demonstrate that 
the adaptive displacement estimation algorithm has good conver- 
gence properties in estimating displacement even for very noisy im- 
ages. 


I. INTRODUCTION 


The coefficient-recursive algorithm described in this paper estimates 
the displacement of objects in a television scene. It is a generalization 
of a pel-recursive displacement estimation algorithm recently intro- 
duced by Netravali and Robbins.’ Coefficient-recursive displacement 
estimation has potential application in hybrid transform-ppcm*‘ inter- 
frame image coders of the type discussed by Reader,’ Roese,® and 
Jones.’ The performance of a hybrid transform-pPcM interframe coder 
using coefficient BeCuSIve motion compensation is described in a 
companion paper.® 

Before defining the coefficient-recursive displacement estimation 
algorithm, it is useful to first describe pel-recursive displacement 
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estimation. Let I(x,, t) denote the intensity of a scene at the kth 
sample point x; of a scan line, and let I(x;, t — +) denote the intensity 
at the same spatial location in the previous frame. If the scene consists 
of an object that is undergoing pure translation, then, neglecting 
background, 


I (xp, t) = I(x, -D,t— 7), (1) 


where D is the displacement of the object in one frame interval r. Pel- 
recursive displacement estimation attempts to estimate D by minimiz- 
ing the squared value of the displaced frame difference, 


DFD (xz, D) = I (xs, t) — I(x, — D, t- 7), (2) 


recursively with k using a steepest descent algorithm of the form: 
Dis = Dz — % € Vi, [DFD(x:, D;)]’, (3a) 


where Vj, is the two-dimensional gradient operator with respect to Dy. 
Carrying out this operation in (3a) and using (2) yields 


Disi = D, — € DFD (xx, Dz) Vix, — Dz, t— 7), (3b) 


where V = V, is the two-dimensional spatial gradient operator with 
respect to horizontal and vertical coordinates x, and x2 in x = (x, X2)7: 


VI (x: = D,-t = T) 7 I(x, t— 7) }x=x,—-D,° (4) 


Superscript T denotes transpose of a vector or matrix. The pel-domain 
interframe coder of Netravali and Robbins predicts intensity I(x:z, ¢) 
by the displaced previous frame intensity I(x, — Dy, t — 1) using 
interpolation for nonintegral D,. If the magnitude of the prediction 
error exceeds a predetermined threshold, the coder transmits a quan- 
tized version of DFD(x;, D;) and address information to the receiver. 
Both receiver and transmitter then update D, according to (3b) using 
this quantized version. Netravali and Robbins’ found that a coder 
using this algorithm consistently obtained bit rates that were 30 to 60 
percent lower than those obtained by “frame-difference” prediction, 
which is commonly used in interframe coders. 

In an interframe hybrid transform-pPcm coder,’ individual frames 
of video are partitioned into blocks having dimension N, rows by N, 
columns, and a two-dimensional transform is performed on each block 
to produce a set of coefficients. The transform coefficients of the qth 
block of the present frame are predicted by the corresponding coeffi- 
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cients of the qth block of the previous (reference) frame and, if the 
prediction error is sufficiently high, the quantized prediction errors are 
transmitted to the receiver. These quantized errors add as correction 
terms to the coefficients predicted by the decoder, which inverse- 
transforms the result to obtain the decoded image. This process repeats 
with both coder and decoder predicting the transform coefficients of 
the next frame by the coefficients of the decoded frame, as illustrated 
in Fig. 1. In this type of codec, data compression is achieved both by 
the redundancy reduction implicit in the prediction process and by the 
fact that some coefficients can be reproduced with low precision (or 
totally omitted) without visibly degrading the reconstructed image. An 
advantage of interframe hybrid transform-DPcM coding over conven- 
tional (3-dimensional block) interframe transform coding’ is the fact 
that the hybrid coder requires only a single frame of storage while the 
conventional transform coder requires many. 

In a motion-compensated hybrid transform-DPcM coder of the type 
envisioned (Fig. 2), the nth coefficient of the gth present-frame block 
would be predicted by the nth coefficient of the displaced qth block of 
the previous frame where the displacement is a recursively updated 
estimate of frame-to-frame translation of the moving object. In this 
paper, we introduce and analyze a displacement estimation technique 
that operates recursively on coefficients in a manner analogous to the 
way (3) operates on pels. 

Section II of this paper defines the coefficient-recursive displace- 
ment estimation algorithm for any real linear transform and gives 
illustrative simulation results using a separable 2-row by 8-column (2 
X 8) transform block. A statistical analysis of the algorithm is given in 
Section III. In the analysis of Section III, a single frame is modeled as 
an image drawn at random from a stationary and ergodic ensemble of 
images. This random sample is assumed to be undergoing pure trans- 
lation from frame to frame. An important result of this analysis is 
stated in Assertion 3 of Section 3.2, which says that, under certain 
conditions, the convergence properties of the coefficient-recursive dis- 
placement estimation are independent of the transform used. Section 
III presents simulation results that support this claim using Hadamard, 
Haar, and Slant transforms. Section IV describes an adaptive version 
of the coefficient-recursive algorithm and presents simulation results 
that indicate that this version can be used to some advantage in 
displacement estimation for noisy images. Illustrative simulation re- 
sults are shown here using a 2 X 4 cosine transform block. 

The algorithms discussed in this paper are local in nature and, as 
such, can estimate the individual frame-to-frame displacements of 
several objects that may be present in the television scene. However, 
we emphasize that all results presented here apply to objects undergo- 
ing pure translation; other types of motion are applicable to this study 
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Fig. 1—Simplified diagram of hybrid transform-DPcM codec. The transform/inverse transform operations inside the dashed box are 
included for conceptual purposes. 


to the extent that these can be approximated by pure translation over 
the spatial dimensions of a transform block. Background uncovered by 
moving objects is also ignored throughout this paper. In spite of the 
approximations involved, simulation results to be described in Ref. 8 
show that the coefficient-recursive displacement algorithm studied 
here can be substantially beneficial when used in an interframe hybrid- 
DPCM codec operating on real-life scenes. 


ll. COEFFICIENT-RECURSIVE DISPLACEMENT ESTIMATION 


Let a field of video be partitioned into rectangular blocks of pels, 
each having dimension N, rows and N, columns (N, x N,). Let xg = 
(x19, X29)" denote the coordinates of the upper left-hand pel of the gth 
block, where the blocks in each row of blocks are numbered from left 
to right with g = 0, 1, 2, ---. We number the N = N,N, pel intensities 
of block g in a column-scanning fashion and denote them by a column 
vector I(x,, t). Let the N component vector @, be the nth basis vector 
of a nonzero but otherwise arbitrary real linear transform, and denote 
the nth coefficient of the gth block of this transform in the present 


frame by cn(q), where 
Cn(q) = (x4, L)on (5) 


and n is numbered from 0 to N — 1. The displaced previous frame 
value of this coefficient is 


é,(g, D) =17(x, — D, t—7)dn, (6) 


where I(x, — D, ¢ — 7) is the column vector of intensities of the 
displaced gth block of the previous frame and D is the estimated 
displacement of the moving object. Computation of the elements in 
I(x, — D, ¢ — 7) generally requires an interpolation among the given 
previous-frame pel intensities. Prediction of cn(q) of (5) by én(q, D) of 
(6) results in coefficient prediction error 


en(q, D) = [I (xe, t) — (x, - D, t—1)]"¢n.- (7) 


The algorithm defined in this section attempts to decrease the squared- 
prediction errors e(q, D)’ in a coefficient-recursive manner by steepest 
descent iteration of the form 


> * € A 
Disi(g) = Dilg) — 5 Vp, inen(g, Dn(q)) 


= D,(q) — €en(g, Dr(q)) Gn(Q) (8a) 
forn = 0,1, ---, M-2, M<N, and g = 0, 1, 2, ---, with 
Do(q) = Du-i(g — 1) 
— €ey-i(q — 1, Du-ilg — 1))Gu-alg — 2D). (8b) 
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In (8), G,(q) is the coefficient gradient vector 
G,(q) - VI" (x, = D,(q), = T)On.- (9) 


Note that (8) operates upon coefficients 0 through M — 1, where for 
generality we assume M < N. Iteration in (8) progresses as follows. 
The initial displacement estimate of the gth block (¢ > 1), Do(q) is 
formed by updating the final displacement estimate Du_i(q — 1) of 
the previous block as in (8b). The next displacement estimate of the 
qth block D,(q) is formed from (8a) with n = 0. Iteration progresses in 
the gth block by (8a) with n = 1, 2, ---, M — 2, resulting finally in 
displacement estimate Dmu-i(q) which, when updated in (8b) (with 
gq — q + 1), forms the initial displacement estimate Do(q + 1) of block 
gq + 1. This iteration procedure continues along all horizontal blocks of 
raster. The procedure is started in the g = 0 block with an arbitrarily 
chosen initial displacement estimate Do(0) followed by iterations of 
(8a) forn = 0,1, ---, M — 2 and q = 0. In the sequel we assume that 
Do(0) is zero. 

The envisioned motion-compensated interframe hybrid transform- 
DPCM coder transmits a quantized version of coefficient prediction 
error e,,(q, D(q)) to the receiver whenever the magnitude | en(q, D(q) | 
exceeds a threshold, thereby enabling the decoder to update its dis- 
placement estimate D,(q) as in (8) as well as correcting its prediction 
én(q, Dn(q)) of coefficient c,(q). Both encoder and decoder use the 
updated displacement estimate in predicting the next coefficient, and 
the process continues. A simplified block diagram of the system that 
omits the thresholding operations is given in Fig. 2. 

In the sequel, it is convenient to rewrite (8) in a form that explicitly 
describes the iteration convention. This can be done by defining a 
single index J, 


t=qM+n; t=0,1,2,--- (10a) 


that equals the total number of iterations of (8) that have occurred in 
iterating from Do(0) to D,(q). Quantities g and n are related to i by 


n = ((2)) (10b) 
q = [le], (10c) 


where we use the notation ((z)) to denote z modulo M and [[z]] to 
denote the integer part of 1/M. 


Using (10), we set D; 4 D,.(q) and rewrite (8) as 
Diss = Di — cew (l21), Dd Gs (Le) (11) 
with 2 = 0, 1, 2, ---. Note that the Netravali-Robbins pel-recursive 
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Fig. 2—Simplified diagram of motion-compensated hybrid transform-DPcM codec. 


displacement estimation algorithm (3b) is a special case of (10) that 
results for a transform “block” having dimension 1 pel by 1 pel and 
single “basis vector” oo = 1. 

We emphasize that recursions (8) and (11) were derived with the 
objective of decreasing the squared coefficient prediction errors of a 
hybrid transform-DPcM codec. As shown in Section III, coefficient 
prediction error is related to displacement estimation error (approxi- 
mately) by a dot product between the displacement estimation error 
vector and vector G,(q) that describes the spatial rates of change of 
the coefficient estimate é,(q, Dn(q)) with respect to small displace- 
ments of the block. Therefore, only the component of displacement 
estimation error in the direction of G,(q) contributes to coefficient 
prediction error, and it is this component that is relevant in evaluating 
the performance of (8) or (11). For this reason, experimental results 
given in this paper refer to the component of displacement estimation 
error measured in the direction of its corresponding coefficient gradient 
Gi (q). 

Experimental illustrations of the behavior of (11) are given in Figs. 
3 through 5 where the moving object was the synthetically generated 
pattern of Fig. 6, displaced 2 pels in the horizontal direction each frame 
interval r. This is a radial cosine function having a radius of 60 pels, 
and peak-to-peak amplitude 220 (out of an intensity range 0 to 255) at 
its center, decreasing to 130 at the circumference. The period P 
decreases with radial distance R starting with a period of 20 pels at 
center to 10 pels at the circumference. The pattern is described 
mathematically by the intensity function 


f(R) = 100 exp(—0.01 R)cos(27R/P) + 128; O=R=60, (12a) 


ERROR IN PELS 
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Fig. 3—Single-line convergence results using 2 X 8 separable Hadamard transform 
with e = 10°. 
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Fig. 4—Single-line convergence results using 2 X 8 separable Hadamard transform 
with « = 10™. 
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Fig. 5—Single-line convergence results using 2 < 8 separable Hadamard transform 
with e = 10°. 
where 
P= (1— R/60.) 10 + 10. (12b) 


This function is displayed on a 256 X 256 element raster in two 
interlaced fields of 128 lines each. In applying (11), the spatial trans- 
forms were taken over a single field with coefficient prediction per- 
formed from the corresponding field separated in time by a frame 
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Fig. 6—Synthetic image used in simulations. This image is described by eq. (11) of 
Section II. 


interval 7. Figures 3 through 5 show displacement estimation error in 
the direction of the spatial gradient of the corresponding coefficient 
versus iteration for a sequence of 2 x 8 blocks located 10 field lines 
above the center of the figure. Iteration was initiated with a zero 
displacement estimate approximately 7 pels within the pattern for 
each horizontal sequence. In these examples and all others presented 
in this paper, the two-dimensional transforms concerned were sepa- 
rable transforms of the form’? 


C=V{IjH, (13) 


where C is the N, X N, coefficient matrix, [J] is the N, X N, matrix of 
pels, and V and H are unitary matrices having dimensions N, x N, 
and N. X N., respectively. Coefficient c,(q) of (5) is the nth column 
scanned coefficient of matrix C, with [7] taken to be the gth pixel block 
of the present frame. In Figs. 3 through 5, V and H were the normalized 
sequency-ordered 2 X 2 and 8 X 8 Hadamard matrices of Fig. 7, and 
iteration of (11) progressed through all M = 16 coefficients in a block 
(i.e., M = N). Figure 3 illustrates the behavior of (11) for « = 107°. It 
can be seen by inspection of this figure that displacement estimation 
error tends to decrease roughly in a series of steps of 16 iterations. 
Iterations 1 to 6, 17 to 22, etc., corresponding to the first few (low 
sequency) vectors in {¢,} tend to affect error significantly, while the 
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iterations corresponding to the higher sequency basis vectors do not. 
This type of behavior is scene-dependent and is investigated in the 
analysis of Section III. Figures 4 and 5 illustrate convergence for 
increased values of e. In general, convergence rate increases gradually 
with increasing € up to a point after which oscillations occur. Conver- 
gence to within 0.5-pel error was achieved for this particular example 
in two or three iterations for « = 10°* while the recursion became 
oscillatory (and eventually unstable as in Fig. 5) at values exceeding 
this. All these results are scene-dependent and, to some extent, de- 
pendent upon the row position of the sequence of blocks. Because of 
this, the behavior of (11) will be examined from a statistical viewpoint 
in Section ITI. 


Ill. PROPERTIES OF COEFFICIENT-RECURSIVE DISPLACEMENT 
ESTIMATION 


The displacement estimation procedure defined by (8) is a nonlinear 
recursion relation whose dynamic behavior is complicated by the fact 
that the error en(q, Dn(q))? is generally a multimodal function of 
D,, (gq), having global minimum at D,.(¢) = D and local minima else- 
where. Convergent solutions to eq. (8) can, therefore, exist at displace- 
ment estimates other than the true displacement D. Subsequent anal- 
ysis will restrict consideration to the case in which the displacement 
estimate is sufficiently close to the true displacement that e?(g, D) can 
be approximated as a quadratic function of D — D. Under this restric- 
tion, the anomalous solutions can be ignored, and (8) reduces to an 
approximately linear stochastic recurrence relation. 

Section 3.1 derives the linearized approximation to (8) and the 
associated quadratic error expression. The dynamic behavior of the 
coefficient-recursive displacement estimator is analyzed in Section 3.2. 
An important result of this section is that, for € sufficiently small, the 
block-to-block convergence rate of mean displacement estimation error 
resulting from (8) is independent of the transform used for any unitary 
transform. 


0.353 0353 0.353 0353 0353 0.353 0.353 0.353 
0.353 «40.353 «00.353 )=—-0.353 — 0.353 — 0.353 — 0,353 — 0.353 
0.707 0.707 0.353 0.353 — 0.353 — 0.353 — 0.353 —0.353 0.353 0.353 
0.353 0.353 — 0.353 — 0.353 0.353 0.353 — 0.353 — 0.353 
0.353 — 0.353 — 0.353 0353 0.353 — 0.353 —0.353 0.353 
0.353 — 0.353 — 0.353 0.353 —0.353 0.353 0.353 — 0.353 
0.353 — 0.353 0.353 — 0.353 —0.353 0.353 —0.353 0.353 


(a) (b) 
Fig. 7—Sequency-ordered Hadamard matrices. (a) 2 x 2. (b) 8 x 8. 
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3.1 Linear analysis 


Assume that the pel intensities are samples of an object that is 
undergoing pure translation D from frame to frame as in (1) so that, 
neglecting background, 


I(x,, t) = I(x, — D, t—7). (14) 


For Euclidean norm ||D,(q) — D || sufficiently small, eq. (7) becomes, 
by Taylor’s expansion about D — D(q), to a linear approximation 


en(q, Dn (q)) = (I(x — D, t — 7) — I(x, — Dy. (q), t— 7) "on 
= Gi(q) An(q), (15) 


where Gn (q) is given by (9) and A, (q) is the displacement estimation 
error A,(q) = D,(q) — D. Using the approximation in (15), we can 
approximate the squared coefficient estimate error by 


en(g, Dn(q)) = An(q) [Gn(q) Gi (q)]An(Q), (16) 


which is a quadratic function of the horizontal and vertical components 
of A, (q). 
In terms of approximation (15), (8) assumes the form 


An+i(q) = [U — € Gn(q)Gr(q)JAn(q) (17a) 
with 
Ao(q) = [U - € Guealg — )Ghra (q— DJAw-lg- 1), (17) 
where U is the 2 X 2 identity matrix. Similarly, (11) becomes 
Ans =(U-€ Gun (DG (TEA, i=0,1,---, (18) 


where A; 4 A, (q). 

Considering the image as a random process, (18) is a stochastic 
recurrence relation. Equations similar to, but somewhat simpler than, 
(18) have appeared in the problem of adaptive tap gain adjustment of 
automatic channel equalizers. Unfortunately, a complete statistical 
description of the behavior of these simpler equations has not yet been 
obtained. The difficulty in analyzing these equations is that their 
solution depends upon products of matrices that are statistically de- 
pendent. It has been found, however, that useful approximate results 
can be obtained by treating the dependent matrices as if they were 
actually independent."’ We use this method in Section 3.2 to analyze 
(18). As shown in Section 3.2 and Appendix A, there is some analytical 
justification for this approach because of certain properties of the 
transforms conventionally used in image coding. Further justification 
is given in the asymptotic analysis of Appendix B. 
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3.2 Statistical analysis 


This section studies the behavior of mean displacement estimation 
error under the assumption that the sequence of gradient vectors 
Gui ({{z]]) entering (18) are statistically independent (“independence 
assumption”). Note that, if the sequence of G«i))([[z]]) in (18) are 
independent, then A; will be independent of the matrix premultiplying 
it. This follows from the fact that A; is determined by the Gy, ({{7]]), 
j < 1, which, by assumption, are independent of Gi«i))({[i]]). The 
ensemble mean of (18) (denoted by the overhead bar) then becomes 


Ava = [U = EGiyy Gti Ai, l = 0, l, 2 cee, (19) 


In writing (19), we have used (9) and assumed a stationary ergodic 
image ensemble. The matrix G,(q)Gi(q) in this case will not depend 
upon x, — D,(q) and is written simply as G«iyyG4i). The matrix 
Guiyy)Gii)) is periodic in i with period M, having values specified by 
GiGi, n = 0, 1, --- M-—1. Alternative expressions for G,G/ are 


GnGi = VI" (x)éndZV71(x) (20a) 
and 
G.Gi = oiRidn o2Ri2bn (20b) 
ee bt Rivbn ptRobn . 


where FR, R2 and Riz are auto- and cross-correlation matrices of I/ax; 


and I/dx2, and 
ee ee 
Ox;" Ox2 | 


Equation (19) can be interpreted in terms of an optimum (Wiener) 
displacement estimator. Consider the mean square nth coefficient 
prediction error resulting from a given displacement error A,(q) = 6. 
To the linear approximation of (15), this is 


F,,(8) = 87G,.Gi8, (21) 


which is a quadratic function of the components of 6. A steepest 
descent algorithm for arriving at the minimum of the F,,(6) for 
n=0,1,---,M-—lis 


6:41 = [LU — eGuiy Gui ]6i, 1 = 0,1, 2, ++. (22) 


Comparing (19) and (22), we see that, under the assumption of mu- 
tually independent G,;))({[z]]), the mean displacement estimation error 
A; satisfies a recursion (22) that minimizes mean-square prediction 


TRANSFORM DOMAIN MOTION ESTIMATION 1685 


error of coefficients n = 0, 1, ---, M— 1. The convergence of (19) and 
(22) is established below in Assertion 1. 

We emphasize that recursion (19) describes the progression of mean 
displacement estimation error under the assumption of independent 
Gai) (({z]]). The conventional approach to transform image coding is 
to choose basis vectors {@,} so that the transmitted coefficients (or 
the prediction errors of these coefficients) are as mutually “indepen- 
dent as possible.”’” This is best achieved by choosing the basis vectors 
{on} of the transform to be the eigenvectors of the covariance matrix 
(i.e., the Karhunen-Loeve basis) of the block of pels. This basis results 
in transform coefficients that are linearly independent (uncorrelated) 
within the transform block, although dependency among coefficients 
from block to block may persist. Other transforms, such as the cosine 
or Hadamard transforms can be viewed as practical approximations to 
the Karhunen-Loeve transform. Assuming that the Karhunen-Loeve- 
basis vectors result in coefficient gradients that are linearly indepen- 
dent as well, this would help justify the application of independence 
theory in describing the behavior of (18). In Appendix A we show that 
this assumption in indeed correct for the stochastic image model most 
widely applied in image processing analyses. Some insight into the 
behavior of (18) for dependent Gyiy)({[i]]) is given by the analysis in 
Appendix B. 

Assertion 1: Under the independence assumption, mean displace- 
ment estimation error in (19) converges to zero if and only if the 
eigenvalues of ¥ are inside the unit circle, where 

M-1 


¥ = T] [U-G,G7}. (23) 


In our product notation (23), matrix U — €GoG) is premultiplied by 
U — €GiGt, etc. 
Proof: This can easily be shown by iterating (19) from i = 0 toz=qM 
t= 

qM+n-1 


Agmn ie IT [U - Gino 


= [U os €G,_1Gi_1] sis. [U = €GoGd | ¥7Ao. (24) 


Therefore, the behavior of Ajm+n aS g increases depends upon the 
matrix ¥ as Y’, and Assertion 1 follows. QED 

A useful sufficient condition for convergence of (19) is given in the 
following. 
Assertion 2: Under the independence assumption and for any nor- 
malized set of basis vectors, mean displacement estimation error in 
(19) is bounded by 
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| Agmsn | = «7 | Ao || (25) 


where x, (0 < « < 1) is the maximum eigenvalue of matrices U — 
eG,G7,n=0,1,--- M-—1, and 


Ee< eS eee ‘ 
N[(@I/dx1)? + (A1/dx2)"] 
Proof: Using the Schwartz inequality on (24): 


(26) 


|| Agarsn |] < |] U — €GnaGr-al] +++ |] U — «GoGo [lll FI" || Aoll, (27) 


where || (-) || for a symmetric matrix is the magnitude of the maximum 
magnitude eigenvalue. Similarly, 


M-1 


Ivis 1 o- eGrGz ||. (28) 


The eigenvalues of U — eG,G7 have the form 1 — eA{? where the A’, 
i = 1, 2, are the (nonnegative) eigenvalues of G,G. Let « be the 
largest of the norms in (28) and let Amax be the largest of the A‘ for 
0<n<M -—1. Then excluding trivial cases, each matrix norm in (27) 
and (28) will be less than unity for 0 < € < 2/Amax, and at least q of the 
norms in (27) is x. Therefore, 


[| A garsn || < «7 || Aol] (29) 





for 0 < € < 2/Amax. 

Assume that Amax corresponds to G,G; for n = k. Then using (20) 
and the fact that the eigenvalues of a nonnegative-definite matrix are 
bounded by the trace of the matrix gives the chain of inequalities 


Amax < Tr[G.G/] 
< bf Ride + 62 Roop 
< N[(aI/ax;)" + (AI/ax2)"] . (30) 


Therefore ¢ in the range (26) guarantees 0 < € < 2/Amax which, in turn, 
guarantees (29). QED 

Allowing for variations in scene statistics, a conservative choice of 
€ would be somewhat less than 2/Amax. In this event, the following 
assertion applies. 
Assertion 3: If iteration is taken over all coefficients (i.e.. M = N) of 
any complete orthonormal basis {n}, and if € is small compared to 
2/Xmax Where Amax is the maximum eigenvalue of matrices {GnGi; n 
= (0. ---, N — 1}, then the block-to-block convergence of A, is 
independent of the particular basis set used. Furthermore, the con- 
vergence rate is independent of block dimensions N, and N.. 
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Proof: The block-to-block dynamics of A; is determined by the matrix 
W of (23). For 0 < € < 2/Amax, and for M = N, ¥ can be approximated 
by 

N-1 


w¥=U-e ¥ G,G?. (31) 


n=0 


N-1 = 
Y GaGi = VI" (x) | YX onda | V7I(x). (32) 
n=0 n=0 


But since {@,} is a complete orthonormal set, Y}=0 ond; = U, the N- 
by-N identity matrix, and 


From (20a), 


W = U—eVI"(x)V71(x), (33) 
which is independent of {¢,}. 
The expectation in (33) is given by 
STA JY (al/am)? Y; (aL/dx1) (AL /dx2) 


where the summations are over the N pels of a block. By stationarity, 
the expectation of each term in (34) is a constant and (33) becomes 





w= U-—eNT 
= (U-eT)*, (35) 
where 
(al /ax1)” (aI /0x1) (0I/dx2) 


[T= (36) 


(aI/dx;)(AI/ax2) (AL /Ax2)” 
Therefore g block-to-block iterations of (19) premultiplies Ao by (U 
— eI)*", which is a function only of the total number of iterations gN 
and is independent of basis and block dimension. QED 
Experimental evidence of Assertion 3 is shown in Figs. 8a to 8d. 
Figure 8a shows the relevant component of displacement estimation 
error versus iteration number averaged over the interior of the moving 
cosine pattern of Fig. 6 for a pel-recursive (unity) 1 < 1 transform and 
a 1 X 8 Hadamard transform of the type in Fig. 7b using € = 5 X 107°. 
For each scan line entering the average, iteration of (11) was initiated 
with displacement estimate D = 0 just inside the circumference of the 
pattern. In spite of the disparity between block size and transform 
type, the block-to-block convergence rate (measured over spans of 
eight iterations) of the Hadamard estimator closely matches that of 
the pel-recursive estimator. Although Assertion 2 concerns average 
displacement error, we found that it often applied as well to individual 
scan lines as shown in Figs. 8b to 8d. These figures show the relevant 
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Fig. 8—Convergence results for various transforms with € = 5 X 10°. (a) Pel-recursive 
and Hadamard 1 xX 8 transform average relevant displacement estimation errors vs 
iteration number. (b) Pel-recursive and Hadamard 1 X 8 transform relevant displacement 
estimation errors for scan line through middle of moving cosine pattern. (c) Hadamard 
1 x 4 and 1 X 2 transform relevant displacement estimation errors for scan line through 
middle of moving cosine pattern (continued). 
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ERROR IN PELS 





ITERATION NUMBER 


Fig. 8—(continued) (d) Haar 1 x 8 and Slant 1 x 4 transform relevant displacement 
estimation errors for scan line through middle of moving cosine pattern. 


component of displacement estimation error versus iteration number 
for the scan line running through the middle of the moving cosine 
pattern. Hadamard 1 X 2, 1 X 4, 1.x 8, Slant 1 x 4, Haar 1 x 8, and 
pel-recursive estimators are all seen to have similar convergence rates 
with « = 5 X 10°” when measured over the appropriate span of 
iterations. This also applies to the cosine transform (not shown), which 
was found to behave similarly to the Hadamard transform. 

Figures 9a to 9c compare convergence of the 1 X 8 Hadamard block 
and pel-recursive displacement estimators as € increases. The image 
data in this case was also the middle scan line of the moving cosine 
pattern. It can be seen that the convergence rates of the Hadamard 
and pel-recursive estimators are in rough agreement for increasing € 
up to the point where oscillations occur (Fig. 9c). Note that oscillations 
occur in the Hadamard estimator before occurring in the pel-recursive 
estimator. This behavior was found to be the case for other transform 
types as well. 

Although the block-to-block convergence rate of transform domain 
displacement estimators is substantially independent of the transform 
type, this is clearly not the case for within block convergence rate, as 
evidenced by Figs. 8 and 9. An explanation of this is given from the 
form of ¥ in (23), in which particular basis vector @, contributes a 
matrix factor of the form [U — «G,G/]. This contribution of ¢, to 
reducing average displacement estimation error depends upon the 
eigenvalues of G,G; which are a measure of the statistical “match” 
between ¢, and the spatial rates of change of the scene. 

It is possible to vary € with n (€ = e€,) to partially compensate for 
differences among the eigenvalues of matrices G,G;. However, this 
technique can also make the algorithm move sensitive to noise that 
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Fig. 9—Convergence results for pel-recursive and 1 x 8 Hadamard displacement 
estimators. (a) « = 1 x 10+. (b) « = 2 x 10%. (c)e = 4 X 10%. 
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may be present in the image data. Section IV gives another approach 
that appears to have particularly good noise rejection properties. 


IV. ADAPTATION 


This section shows how the coefficient displacement estimation 
algorithm of Sections II and III can be improved by adaptively updat- 
ing the displacement estimate according to the local features of the 
image. This is a technique that is not possible for the pel-recursive 
estimation. Adaptation in displacement estimation is motivated by the 
recognition that single frames of video are neither noiseless nor best 
described as stationary processes. Simulation results using noise-cor- 
rupted versions of the radial cosine object of Fig. 6 demonstrate that 
an adaptive algorithm of the type described here can have better 
convergence properties than either pel-recursive or nonadaptive coef- 
ficient recursive displacement estimation. 


4.1 Preliminaries 


The potential advantage of adaptation in (8) can be seen by consid- 
ering the simple example of the moving edge scene of Fig. 10. This 
edge has constant slope g = 3.8 intensity increments per pel-to-pel 
distance over a width of 50 horizontal pel intervals, and velocity 2.7 





Fig. 10—Synthetic moving edge pattern. 
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pels horizontal per frame. Within the width of the edge, G,(q) of (9) 
is given by 


Gilg) = ae | on. (37) 


For a sequency-ordered basis set {¢,}, Gn(q) will be zero for all ac- 
basis vectors with the result that the corresponding displacement 
estimate updates will be determined solely by noise that may be 
present on the edge. Updating displacement estimate by iterating over 
these basis vectors can only increase estimation error. The dc-basis 
vector (1/ JN ), however, results in 


Gn(q) = Gy. ; (38) 


which provides signal-dependent terms in (18) proportional to VN. 
This suggests that, for this example, it may be better to iterate 
repeatedly (say M times) over the dc-basis vector than to sequence 
through the M-basis vectors. We have not been able to analyze 
rigorously the performance of such dc-basis iteration on the moving 
edge in the presence of additive noise. In Appendix C we assume that 
the additive noise is white with power o;,, and invoke certain assump- 
tions regarding the independence of noise terms entering the recur- 
rence relation. The result is the following approximate expressions for 
the horizontal component of displacement estimation error mean 7(1) 
and steady-state variance o4. 


na(t) <n A(0)(1 - B)'; l= 0, 1, 2; me" (39a) 


2 
0% ~ 2a(1 + a) B? lon SE + | (39b) 
where 6 = €Ng? and a = o2,/Ng?. 

Not too surprisingly, expressions (39) indicate that, for constant rate 
of convergence (i.e., constant 8), the steady-state displacement esti- 
mate error variance decreases inversely with block size N. This points 
out a possible advantage of transform domain displacement estimation 
compared to pel-recursive displacement estimation where the dc-basis 
block size is constrained to have dimension N = 1. Figures 11 through 
13 show experimental and theoretical behavior of the horizontal com- 
ponent of displacement estimation error for pel-recursive and 2 X 8 
dc-basis iteration on the moving edge. In obtaining the sample mean 
na(t) and sample variance 64, averages were taken over 128 field lines. 

Experimental and theoretical results are seen to compare favorably 
in the pel-recursive estimator case (Fig. 11) at a signal-to-noise ratio 
(SNR) = 45 dB. In Fig. 12, which describes displacement errors in the 
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Fig. 11—Horizontal component of displacement estimation error for noisy moving 
edge (SNR = 45 dB) using pel-recursive displacement estimation. € = 0.02. 
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Fig. 12—Horizontal component of displacement estimation error for noisy moving 
edge (SNR = 35 dB) using pel-recursive displacement estimation. € = 0.02. 
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pel-recursive estimator at 35 dB, there is approximate agreement 
between theory and experimental data. This is also the case for the 2 
Xx 8 dc-basis results of Fig. 13. From the experimental data in Figs. 12 
and 13, it can be seen that the “dc-iteration” is more effective than 
pel-recursive estimation in combating the effects of noise in the dis- 
placement estimation of a moving edge. 

The theoretical and experimental behavior of steady-state displace- 
ment error variance o4(°) versus £ is plotted in Fig. 14 for pel-recursive 
displacement estimate for the moving edge pattern with SNR = 35 dB. 
The range 1 < B < 2 represents oscillatory convergence with o(«) 
increasing rapidly as 8 approaches 2. Note the trade-off between 
convergence rate and accuracy evidenced by Fig. 14 and eq. (39a). 


4.2 Adaptation algorithm 


The adaptive displacement estimation algorithm proposed here 
updates the displacement estimate (8) using that basis vector mo 
whose projection onto the computed coefficient gradient of the refer- 
ence frame has maximum amplitude. At each iteration step (n, q), the 
magnitude of the coefficient gradient vector || G,,(q) || is computed from 


e 
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Fig. 13—Horizontal component of displacement estimation error for noisy moving 
edge (SNR = 35 dB) using 2 X 8 dc basis iteration. e€ = 0.00125. The factor € was adjusted 
in this experiment to result in an identical average convergence rate as that of Figs. 11 
and 12. 
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the noisy previous frame data for each vector in the given basis set 
{dm; m = 0, 1, --» M — 1}. The particular basis },0 maximizing this 
quantity is then used for the displacement update. 

Figure 15 compares the performance of adaptive and nonadaptive 
displacement estimation using a separable 2 X 4 cosine transform and 
€ = 10°” for the test image of Fig. 16. This image consists of the moving 
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Fig. 14—Estimation error standard deviation vs rate parameter B: pel-recursive at 
SNR = 45. 
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Fig. 15—Displacement estimation errors for noisy synthetic image of Fig. 16. 
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object of Fig. 6 corrupted by additive white Gaussian noise at 20-dB 
snr. Also shown are results for pel-recursive displacement estimation 
at « = 10°° and a convergence-rate-optimized « = 4 x 10°°. The pel- 
recursive scanning pattern was chosen according to Fig. 17 to match 
the rate of progression of the pel-recursive algorithm along individual 
scan lines to that of the 2 x 4 transform algorithm. (For example, after 





_INTERLACED 
ty ES 





Fig. 17—Pel-recursive scanning pattern applicable to test results of Fig. 15. 
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40 iterations, both pel-recursive and coefficient-recursive displacement 
estimators will have traversed 20 columns of a given scan line.) All 
results of Fig. 15 are averages of relevant displacement error over the 
interior of the moving object. It can be seen from Fig. 15 that the 
adaptive 2 X 4 cosine displacement estimator has clearly superior 
average error convergence rate than either the pel-recursive or non- 
adaptive cosine displacement estimators. No choice of «€ was found 
that could improve the convergence rate of the pel-recursive estimator 
beyond that shown for « = 4 x 10°°. We also computed the experimen- 
tal standard deviations of relevant displacement estimation error. 
Nonadaptive cosine and pel-recursive estimators had experimental 
error standard deviations of 0.50 and 0.51, respectively, at « = 10°°. 
The error standard deviation of the adaptive coefficient-recursive 
displacement estimator was 0.54, while that for the rate-optimized pel- 
recursive estimator was an inferior 0.67. 

The above results demonstrate a potential advantage of adaptive 
coefficient-recursive displacement estimation over both pel-recursive 
and nonadaptive coefficient-recursive displacement estimation. It re- 
mains to be established, however, whether the adaptive scheme of this 
section will improve the performance of a motion-compensated hybrid 
transform-DPCM coder. 


V. SUMMARY 


This paper has introduced a coefficient-recursive displacement es- 
timator having potential application in motion-compensated inter- 
frame hybrid transform-DPcM image coders. The convergence of the 
mean displacement estimate to the true displacement was established 
in Assertions 1 and 2 using assumptions that are supported by the 
analyses of Appendices A and B. Assertion C described conditions 
under which the rate of convergence of mean displacement estimation 
error is independent of the transform block size and type. An extension 
of the coefficient recursive algorithm was given in Section IV and 
shown by simulation to have improved convergence properties in the 
displacement estimation of noisy objects. 


APPENDIX A 


This appendix verifies a statement in Section III concerning the 
orthogonality of coefficient vectors G,(q) n = 0,1, --- M-—1 fora 
separable Markov image model. This model is described as follows. 
Let In,» denote the intensity of the pel located in the mth row and nth 
column of the raster. Then for the Markov image model treated here: 
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dgelis = op ln tl pln (40) 


with 0 <|,|<1,0<|p.| <1. In (40), o” is the intensity variance, and 
pr and p, are the correlation coefficients of adjacent pels between rows 
and columns, respectively. If the pel intensities in a N, row by N, 
column block are indexed in column scan fashion and denoted by a 
column vector I, then it can be shown that 


I.I* = 0°M, x M, (41) 
where X denotes the Kronecker product,'’ and M, and M, are, respec- 
tively, N, x N, and N,. X N, Toeplitz matrices having ith entries 

[M,]i; = pr! 


and 


[M.]i = p vt, (42) 


Covariance models of this form have been widely applied in image 
processing studies.'*”* Expressions for the eigenvalues and eigenvec- 
tors of M, (or M.) are given in Ref. 18. 

We now apply this model to a study of the covariance properties of 
G.(q),n = 0,1, --- M—1. From (9) we have 


Fo | onRidm  bnRizdm 
G,(q)Gn(q) ed peas ozRobm . (43) 


As described in Section III, matrices Ri, Re, and Ry. are auto- and 
cross-correlation matrices of the spatial derivatives of J. For the 
discrete image model specified by (40), we compute these spatial 
derivatives as the corresponding spatial differences. For example, 
derivatives in the row direction of the raster are given by 
(Qnjn — Imn-1) Ui; — Ti,;-i). Expanding this product and using (40) gives 
(fin a Inn—1) Ki j = Iij-i) = op nil [a-plr”! + B-8n;], (44) 
where a, = 2 — pz! — pe, Bc = Pe — Pc, and 6,; is the Kronecker delta 
function. By comparing (40) and (41) with (44), we have 
R, = 0°M, X [a-M. + BU], (45) 
where U is an identity matrix. 
Similarly, with a, = 2 — p-' — p, and B, = p;' — pr: 
R2 = o*[a-M, + B-U] x M. (46) 


and 


Ry = oa,-M, + B-U] X [acM. + BU]. (47) 


1699 


As shown in Ref. 13, the eigenvectors of a Kronecker product A x 
B have the form 


xy? 

where x}, k = 1, 2, ---, N, denotes the components of the eigenvector 
x' of A and y’ denotes an eigenvector of B. Since any vector is an 
eigenvector of an identity matrix, it follows that the eigenvectors of 
R,, Re, and Riz above are in fact identical to the eigenvectors of the 
image covariance matrix (41). The normalized eigenvectors form a 
complete orthonormal set and are considered to be the optimum bases 
for transform coding image blocks modeled by (41). Selecting {¢,} to 
be this set of eigenvectors, it follows that all terms in the matrix of 
(43) will be identically zero for n ¥ m, which establishes the statistical 
orthogonality of the G,(q). 


APPENDIX B 


This appendix shows that mean displacement estimation error for 
dependent Gi i))([[{i]]) is approximately given by recursion (19) for 
small e. 

Iterating (18) yields 


Ani = (11 [U- eG (L/DGEo(CL/ID]) Ao (48) 


The matrix product premultiplying Ao in (48) is a function of €. Taylor’s 
expansion of this function about € = 0 yields (for fixed 7): 


Ain = (u —€ IT Gw»y(lWGtty 1) )a + O(c?) (49) 
so that 


Ain = (u Se), GGT) + O(e’). (50) 


On the other hand, repeating the steps of (48) and (49) on (19) gives 
(for the independent Gy ;)({[¢]]) case): 


Ain = (x =e), Ginn ) Bo + O'(e’). (51) 


J=0 


Comparing (50) and (51) we have the result that mean estimation 
errors for the dependent and independent cases are equal to an 
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approximation O(e”) — O’(e’), which can be neglected for sufficiently 
small e. 


APPENDIX C 


This appendix briefly sketches the steps leading to (39). 
We consider the successive frames of video to consist of a moving 
edge contaminated by independent noise: 


I(x, tT) = g(x. — D) + w(x, ¢) 
I(x, t — 7) = gx, + w(x, t— 17), (52) 


where the noise w(-, -) is white with respect to both pel-to-pel and 
frame-to-frame dimensions. An analysis similar to that leading to (18) 
then yields 


Aisi = A; — €[Gr(g) + VW" (xpin = Dir — T)hn]] 
[Gi(q)Ai t+ W"(xqin, Don — W(x — Di, t-—T)dn]. (53) 


We consider iterating (53) repeatedly, using dc-basis vector @o = 
1/ VN and G,(q) given by (38). Let A(z) denote the displacement error 
in the horizontal direction at iteration 2; Z,(1) denote the difference of 
the two noise terms in the final term of (53); and Z2(i) denote the 
horizontal component of the noise gradient term in (53). Then the 
horizontal component of (53) becomes 


A(i + 1) = A(i) — eNg*[A(i) + Zi(i)/VNg][1 + Z2(i)/VNg]. (54) 
Defining 8 = «Ng?, y = 1/VNg and rearranging terms yield 
A(Zi + 1) =[1 — BU. + yZa(z))JA(z) — By. + yZ2(t))Zi(t). (55) 


Note that each noise term in (55) is multiplied by a factor y which, 
for given £, decreases as 1/VN. Neglecting dependencies between 
{Z2(t)}, and {Z,(z)}, this equation is linear with respect to input 2; 
and output A. The solution has the form 


i-1 t 


A(t) = A(0) II [1 — BQ + yZ2(7))] + » A(t, k)Zi(k), (56) 
= za 


where h(i, k) is the response of (55) for A(0) = 0 and input Z;(z) = diz. 
By assuming that {Zi(z)} and {Z2(z)} are mutually independent white 
sequences, the mean and variance of A(z) can now be derived by a 
tedious but conventional analysis, resulting in (39). 
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Interframe hybrid transform/ DPCM coders encode television signals 
by taking a spatial transform of a block of picture elements in a 
frame and predictively coding the resulting coefficients using the 
corresponding coefficients of the spatial block at the same location in 
the previous frame. These coders can be made more efficient for 
scenes containing objects in translational motion by first estimating 
the translational displacement of objects and then using coefficients 
of a spatially displaced block in the previous frame for prediction. 
This paper presents simulation results for such motion-compensated 
transform coders using two algorithms for estimating displacements. 
The first algorithm, which is developed in a companion paper, recur- 
sively estimates the displacements from the previously transmitted 
transform coefficients, thereby eliminating the need to transmit the 
displacement estimates. The second algorithm, due to Limb and 
Murphy, estimates displacements by taking ratios of accummulated 
frame difference and spatial difference signals in a block. In this 
scheme, the displacement estimates are transmitted to the receiver. 
Computer simulations on two typical real-life sequences of frames 
show that motion-compensated coefficient prediction results in coder 
bit rates that are 20 to 40 percent lower than conventional interframe 
transform coders using “frame difference of coefficients.” Compari- 
sons of bit rates for approximately the same picture quality show that 
the two methods of displacement estimation are quite similar in 
performance with a slight preference for the scheme with recursive 
displacement estimation. 


|. INTRODUCTION 


Television signals, which are generated by scanning a scene 30 times 
a second, contain a significant amount of frame-to-frame redundancy. 
A large part of this redundancy can be removed by the technique of 
conditional replenishment.’ In conditional replenishment, each frame 
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is segmented into two parts: background, which consists of picture 
elements (pels) having intensities similar to the previous frame pels, 
and moving area, which consists of pels that differ significantly from 
the previous frame pels. Information is transmitted only about the 
moving area in the form of prediction errors and addresses of the 
moving area pels. Conditional replenishment schemes can be improved 
by estimating the displacement of objects in the scene and using the 
displacement estimate for predictive coding by taking differences of 
elements in the moving area with respect to appropriately displaced 
elements in the previous frame. Such schemes have been referred to 
as motion-compensated coding schemes.*" 

Transform domain methods have been widely discussed for band- 
width compression of still images or single frames.'” They can also be 
used for coding of sequences of television frames by taking a two- 
dimensional spatial transform followed by predictive coding using 
corresponding coefficients from the spatial transform of the previous 
frame.'*"’® This type of hybrid coding” relieves the storage problems 
associated with the use of three-dimensional transform blocks. Such a 
scheme can be made more efficient for scenes containing objects in 
motion by using, for prediction, coefficients of blocks from the previous 
frame that are spatially displaced from the present frame block by an 
amount equal to the displacement of objects.,As in the pel domain, the 
success of motion compensation in transform coders depends upon: (2) 
the amount of purely translational motion of objects in the scene, (iz) 
the ability of the displacement estimation algorithm to estimate the 
translation with an accuracy necessary for good prediction of the 
coefficients, and (iii) the robustness of the displacement estimation 
algorithm when the resolution of the transmitted picture is changed to 
match the coder bit rate to the channel rate. 

In this paper, we use two previously published displacement esti- 
mation algorithms for motion-compensated transform coding. The first 
algorithm is an extension of a corresponding method in the pel do- 
main.’*"’ It works recursively on the previously transmitted transform 
coefficients of the present as well as the previous frame. It therefore 
requires no separate transmission of the displacement estimate. This 
algorithm is discussed in detail in a companion paper,’ where its 
properties are described both analytically and experimentally in terms 
of certain simple synthetically generated scenes. The other method of 
displacement estimation that we use is due to Limb and Murphy.” It 
estimates displacements in a block of pels using a ratio of accummu- 
lated frame difference and spatial difference signals from future as well 
as past data. These displacement estimates are nonrecursive and must 
be transmitted separately to the receiver. The present paper investi- 
gates the performance of the two displacement estimation algorithms 
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in the context of interframe coders operating on real-life scenes that 
contain fairly complex (nontranslational) motion. Results are given 
here on the effects of various coder parameters such as block size, 
particular transform (Hadamard, cosine, etc.), and other parameters 
of the displacement estimators. The primary result of this paper is that 
the application of either recursive or nonrecursive motion estimation 
provides a 20 to 40 percent decrease in bit rate, compared to conven- 
tional, uncompensated hybrid transform/DPCM coding. We have found 
that the use of large block sizes in motion estimation degrades the 
coder performance. This may be.a result of spatially nonuniform 
displacements being averaged over the transform block by the displace- 
ment estimator. Also, since the motion in real scenes is generally not 
uniform in rectangular blocks, as the block size is increased, only a 
fraction of elements in a block are compensable with a given displace- 
ment, and therefore transmitting coefficients of a larger block contain- 
ing some compensable and some uncompensable pels becomes ineffi- 
cient. 


ll. HYBRID TRANSFORM CODING WITHOUT MOTION COMPENSATION 

In an interframe hybrid transform-DPcmM coder, a field of video is 
partitioned into blocks having dimensions N, rows by N. columns, and 
a two-dimensional transform is performed on each block to obtain a 
set of coefficients. Transform coefficients of the gth block of the 
present frame are predicted by the corresponding coefficients of the 
qth block of the previously encoded frame, and, if the prediction error 
is above a specified threshold, the quantized prediction errors are 
transmitted to the receiver. These quantized errors are added to the 
coefficients predicted by the receiver, which inversely transforms the 
result to obtain an image for display at the receiver. A block diagram 
of an interframe hybrid transform-DPcm transmitter is shown in 
Fig. 1. Data compression is achieved both by the redundancy removal 
implicit in the prediction process and because some coefficients can be 
reproduced with low precision (or totally omitted from transmission) 
without visible degradation in the reconstructed picture. 

The performance of the interframe hybrid transform-DPcM coder 
and the other coders described in later sections of this paper is 
evaluated in terms of bit rate for an acceptable subjective picture 
quality using two scenes, one called Judy and one Mike and Nadine. 
The coding degradation was judged in informal tests by the authors to 
be just perceptible from a viewing distance of six times the picture 
height. These scenes consist of 64 frames (2:1 interlaced fields) of 256 
x 256 samples each, obtained at 30 times a second and sampled at 
Nyquist rate from a video signal of 1-MHz bandwidth. The scene Judy 
contains head-and-shoulders views of a person engaged in a rather 
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Fig. 1—Block diagram of hybrid transform/pPcm interframe image encoder. 


active conversation. The portion of a frame classified as moving area 
varies from 15 to 51 percent. The motion is not strictly translational, 
and there are different parts of the scene moving differently (such as 
lips, eyes, and head). Four frames of this scene are shown in Fig. 4 of 
Ref. 10. The scene Mike and Nadine contains a panned full-body view 
of two people briskly walking around each other on a set with severe 
nonuniform and time-varying illumination. The percentage of a frame 
classified as moving area varied from 92 to 96 percent. Four frames 
from this sequence are shown in Fig. 5 of Ref. 10. 

In our simulations of the interframe hybrid transform-DPcM (called 
conditional replenishment in the transform domain), the coefficients 
of two corresponding spatial blocks of the same field from two succes- 
sive frames are compared, and if the difference is more than a thresh- 
old, the coefficient is transmitted. Thus, if {cz},=0,....1-1, and 
{Cz},n=0,....m-1 are M selected coefficients (out of N coefficients in a 
' block) of the present and coded previous frame blocks, respectively, 
then the quantized error, Q,[c. — Cz], is transmitted only if | c, — Cz | 
=> T;,, where Q,[- ] is the quantizer for the kth coefficient, and T;, is the 
threshold. If c, is not transmitted, then its value at the receiver is 
assumed to be c,. Thus the transmission consists of the quantized 
prediction error of the coefficients that were selected for transmission 
and the addresses of the coefficients that were dropped from the 
transmission. The information necessary to convey addresses of the 
coefficients selected for transmission was computed based on the run- 
length coding of runs of coefficients within a block and then from block 
to block. Parameters of the coder such as the number of coefficients 
that were entirely dropped from the transmission, the thresholds {T;} 
for selecting the transmitted coefficients, and the quantizer scales were 
adjusted* to produce pictures in which coding degradations were just 
perceptible. The entropies of the prediction errors and the run lengths 
specifying addresses of the transmitted coefficients are added to com- 
pute the total bit rate. 

The results are shown in Fig. 2, in which the bit rate is plotted as a 
function of the frame number for 60 frames. In these simulations and 
those of the next section, the coder was initialized so that it used the 
unquantized original first frame for prediction of the second frame. For 
comparison, the results from Ref. 10 are reproduced for conditional 
replenishment in pel domain. The comparison shows that, in the 
transform domain, using a cosine transform on a 2 X 4 block, there is 
a reduction of about 10 percent in bit rate over that obtained in pel 


* We do not claim that these adjustments resulted in an optimum set of parameters. 
However, a sufficiently large set of parameters was tried, giving us confidence that our 
results are not far from the optimum. 
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Fig. 2—Performance of conditional replenishment and motion-compensated trans- 
form coders. Kilobits/frame are plotted as a function of the frame number for a typical 
sequence containing active motion of a head-and-shoulders view. 


domain.* For this particular case, we dropped the eighth coefficient 
entirely, and the prediction errors for the other seven coefficients were 
quantized with uniform quantization scales with step sizes of 3, 5, 5, 7, 
7, 9, 9, respectively. The thresholds {7T;,} for predictability were chosen 
to be 1, 2, 2, 3, 3, 4, 4 (out of 255) for the seven coefficients, respectively. 

We varied some parameters of the transform to evaluate the sensi- 
tivity of these results to the block size and the type of transform used. 
Some of these are shown in Fig. 3. It is seen from this figure that a 
one-dimensional cosine transform with four elements did worse than 
the conditional replenishment in the pel domain (between 5 to 10 
percent). As the transform size was increased, the bit rate dropped; for 
2 x 2 block and cosine transform, the results were similar to the 
conditional replenishment in pel domain; the 2 X 8 block using the 
cosine transform, on the other hand, did about 15 percent better than 
the conditional replenishment in pel domain. We also tried different 
transforms and found that for small block sizes they were equivalent 
to the cosine transform but, as the block size was increased, the cosine 


* Of course, several other modifications can be made to improve the pel domain 
conditional replenishment. Our comparison is not meant to be a comparison between 
pel domain and transform domain coding in general. 
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transform behaved better than the other transforms. The results for 
the 2 x 8 block using the Hadamard transform basis were very similar 
to those of 2 x 4 block and cosine transform but were inferior to those 
of the 2 x 8 block and cosine transform. 

Figure 4 shows the distribution of the bits required for addressing 
and for the transmission of the first coefficient. As is seen, the address- 
ing bits are about 50 percent of the total bits. This is a significant 
increase in addressing requirement compared to the conditional re- 
plenishment in the pel domain, where the addressing accounts for only 
about 20 to 30 percent of the total bits. This may be a result of using 
only the prediction error corresponding to the coefficient being coded 
for deciding whether that coefficient should be transmitted. This may 
have made the decision to transmit a coefficient unnecessarily noisy. 
We did, however, try several methods of reducing the addressing bits, 
but none of these resulted in an overall bit rate reduction. In the 
fraction of the bits that are required to send the prediction errors, 
those for the first coefficient account for more than half, as shown in 
Fig. 4. Thus the addressing and the first coefficient take up around 80 
percent of the total bits generated by the conditional replenishment 
coder. 
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Fig. 3—Performance of conditional replenishment in the transform domain with 
various transforms. 
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Fig. 4—Distribution of total compressed bits/frame into addressing information and 
that required for the transmission of the first coefficient. 


Ill. MOTION COMPENSATION WITH RECURSIVE DISPLACEMENT 
ESTIMATION 


In the motion-compensated hybrid transform-DPcM coder shown in 
Fig. 5, the nth coefficient of the gth present field block is predicted by 
the nth coefficient of either the displaced or the nondisplaced block of 
the previous frame, depending on which was better for the (n — 1)th 
coefficient, where the displacement is an estimate of the frame-to- 
frame translation of a moving object. The displacement estimation 
technique used in this section is identical to the one given in our 
companion paper.'’® We describe it as follows: Let xy = (X1g, X2q)” 
denote the coordinate of the upper left-hand pel of the qth block, 
where the blocks in each row of blocks are numbered from left to right 
with q = 0, 1, 2, ---, and superscript T denotes the transpose of a 
vector or matrix. The pel intensities of block gq in a column-scanning 
fashion are denoted by a column vector I(x,, ¢). Let the nth basis 
vector of the transform be denoted by @,, and, therefore, the nth 
coefficient of the gth block of the transform of the present frame can 
be written as 


Cn(q) = 17(Xq, £)Gn- (1) 
The displaced previous frame value of this coefficient is 
é,(q, D) = I"(x, — D, t — 7)@n, (2) 


1710 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 


ONIGOO WHOSSNVYL GALVSNAdNOO-NOILOW 


LELZL 











THRESHOLD 
COMPARISON 


ADDRESS 
CODER 










TO 
CHANNEL 











MULTI- 


O, [e,(a, D,(a))) PLEXER 









en (q, B,,(q)) va 


CODER 






















QUANTIZER 





SPATIAL 
GRADIENT 
OF COEFFICIENT 


COEFFICIENT 
GRADIENT 


COEFFICIENT 
FRAME 
STORE 








INTERPOLATION 










FRACTIONAL 
DISPLACEMENT 


INTEGRAL 
DISPLACEMENT 


Fig. 5—Block diagram of motion-compensated transform/DPCM coder. 


where I(x, — D, ¢ — 7) is the column vector of intensities of the 
displaced gth block of the previous frame and D is the estimated 
displacement of the moving object. Computation of the elements in 
I(x, — D, t — +) generally requires an interpolation among the given 
previous-frame pel intensities. The displacement estimation algorithm 
attempts to minimize the prediction error in predicting c,(q) by ¢,(q, 
D) by the steepest descent iteration of the form 


a a E A 
Dnii(g) = Dilq) — 5 Vp,iaeng, Dn(Q)) 


= D,(g) — €en(g, Da(g))V 17(Xq — Dn(g),t— 7)Gn (3) 
forn = 0,1, ---, M-2 and q = 0, 1, 2, ---, with 
Do(q) — Du-i(q — 1) 
— eey-i(q — 1, Du-s(q—-1)) 
VI? (x 9-1 — Du-i(q — 1), t — 7) @0, (4) 


where e,(g, Dn(q)) is the error in the prediction of Cn(q) (1.e., Cn(g) — 
én(q, Dn(q)) and M is the number of displacement iterations performed 
per block. Thus the iteration proceeds by first assuming the initial 
displacement estimate of the qth block, Do(q), as an update from the 
final displacement estimate of the g — 1 block Dy_i1(q — 1). The next 
displacement estimate of the gth block, D,(q), is formed from eq. (3) 
with n = 0. Iteration progresses in the gth block from coefficient to 
coefficient, resulting finally in displacement estimate Dy_i(q). This 
iteration procedure continues along all horizontal blocks of the raster. 
The initial displacement of the leftmost block is assumed arbitrarily to 
be zero. 

Such a motion-compensated transform encoder transmits a quan- 
tized version of the coefficient prediction error e,(q, D(q)) to the 
receiver whenever the magnitude | en (g, D(q) | exceeds a given thresh- 
old T,, thereby enabling the decoder to update its displacement 
estimate D, (q) as in eqs. (3) and (4), as well as correcting its prediction 
of coefficient c,(q). Both the encoder and decoder use the updated 
displacement estimate in predicting the next coefficient, and the proc- 
ess continues. We note that, since only previously transmitted infor- 
mation is used for displacement updating, no separate transmission of 
displacement is necessary. A simplified block diagram of the hybrid 
motion-compensated coder is shown in Fig. 5. 

The results of motion-compensated coding in the transform domain 
for the scene Judy are given in Fig. 2. In this figure, total bits per frame 
are plotted against the frame number. For purposes of comparison,* 


* It should be noted that motion compensation in pel domain used intensities of the 
previous field rather than frame, whereas motion compensation in transform domain 
used intensities of the previous frame. It was found that, for the pel domain case (Ref. 
10), previous field intensities give better results. 
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this figure also shows conditional replenishment in the pel domain as 
well as in the transform domain and the two motion compensation 
techniques of Refs. 10 and 11 in the pel domain. Motion compensation 
in the transform domain is about 20 to 40 percent better than condi- 
tional replenishment in the transform domain. Also, motion compen- 
sation in the transform domain is better than one of the pel domain 
motion compensation techniques by about 5 to 10 percent. This pel 
domain technique is described in Ref. 10. It segments a frame into 
three types of areas: background, compensable moving area, and 
uncompensable moving area, and, therefore, requires a significant 
amount of address transmission. Motion compensation in the trans- 
form domain results in about 20-percent higher bit rates compared to 
the second motion compensation technique in pel domain. In this 
second technique, which is described in Ref. 11, each frame is divided 
only in two parts, predictable and unpredictable, and thus transmission 
of moving area address information is eliminated. 

Results of motion compensation in the transform domain which uses 
different types of transforms and block sizes are given in Fig. 6. This 
figure shows that the cosine transform with 2 x 4 block does the best. 
Increasing the block size increases the bit rate, perhaps as a result of 
the uncompensable area (i.e., the pels for which the prediction error is 
larger than threshold T,,) being in small isolated fragments. This result 
is in contrast with the result obtained with larger block sizes in 
conditional replenishment, where a larger block size, such as 2 X 8, 
gave better results than a smaller block size, such as 2 X 4. A one- 
dimensional transform, for example, the 1 x 4 block cosine transform, 
does worse than motion compensation A in the pel domain, whereas a 
2 x 2 block using the cosine transform, on the other hand, is equivalent 
to motion compensation A in the pel domain. 
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Fig. 6—Performance of motion-compensated coder with different transforms. 
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Figure 7 is a plot of the portion of the bits required for addressing 
and the error transmission for the first coefficient. It is seen that, with 
motion compensation in the transform domain, as in the pel domain, 
the addressing takes up a significant portion of the total bits. This 
portion varies from 40 to 60 percent of the total bits. The first 
coefficient transmission requires more than 50 percent of the bits 
required for transmission of the coefficients. Although the figure shows 
the results for 2 x 4 block and cosine transform, similar results were 
obtained for other transforms. 

We found that more coefficients could be dropped altogether from 
transmission in the motion-compensated transform coder than in the 
conditional replenishment transform coder. For example, with a 2 x 4 
block and the cosine transform, only five coefficients were needed in 
motion compensation, compared to the seven coefficients that were 
necessary for conditional replenishment. Unfortunately, however, the 
effect of dropping a larger number of coefficients did not result in a 
large bit-rate reduction, since the number of bits required for these 
coefficients was very small. 

The results of Fig. 6 were obtained by adjusting the quantization 
scales and the predictability thresholds {T,,} in such a way that coding 
degradations in pictures were just perceptible in informal viewing by 
the authors. The quantization scales that we used were from uniform 
quantizers with step sizes of 3, 5, 5, 7, 7 (for the first five coefficients of 
the 2 X 4 cosine transform), and the predictability thresholds were 2, 
3, 3, 4, 4 (out of 255) for the first five coefficients. Coarser quantization 
of the higher order coefficients was possible in motion compensation, 
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Fig. 7—Distribution of total compressed bits/frame into addressing information and 
that required for the transmission of the first coefficient. 
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compared to conditional replenishment, without significantly degrad- 
ing picture quality. An increase of the predictability thresholds of the 
first coefficient resulted in rapid degradation of the picture quality. As 
the predictability threshold was increased, the block structure of the 
transform became clearly evident, and the frame-to-frame effects of 
the visibility of the block structure were found rather annoying. For 
higher order coefficients, however, the picture quality was not a 
sensitive function of the predictability thresholds. It appears that due 
to better prediction in motion-compensated transform coding, effects 
of coarser quantization and higher predictability thresholds are seen 
in smaller disjoint areas of the picture and, therefore, their visibility is 
lower. 

The recursive displacement estimation algorithm of eq. (2) was also 
varied by changing « and by changing the order of iteration of the 
coefficients within a block. We found that ¢€ variation did not change 
the bit rates significantly as long as € was within a decade of 0.0001. As 
expected,'® much larger e€ resulted in noisy estimates of displacement, 
whereas smaller values of € took longer to converge. The order of 
iteration was varied by estimating displacement starting from the first 
coefficient to the fifth (for a 2 X 4 transform block and cosine transform 
with transmission of only five coefficients) or starting from the fifth 
coefficient to the first. This corresponded to iterating from low fre- 
quency to high frequency or vice versa. We found that going from high 
frequency to low frequency resulted in a smaller number of bits for 
transmission of the prediction error by about 5 to 10 percent. However, 
since amplitude bits account for only about 50 percent of the total bits, 
the overall reduction was only about 2 to 5 percent. Another variation 
that was tried consisted of iterating only the first (dc) coefficient for 
the entire block; that is, iterating the first coefficient five times instead 
of iterating all the five coefficients once. This variation resulted in 
performance which was very similar to the case in which all the 
coefficients were iterated. Iterating some other higher order coeffi- 
cients five times (with no iteration of the first coefficient), however, 
was found to be quite inferior. Although all the above conclusions are 
based on the scene Judy, similar conclusions are true for the scene 
Mike and Nadine. In general, as in pel domain,'”"! the bit rate for 
Mike and Nadine was much higher than that for Judy. It varied 
between 170 and 200 kilobits per frame for conditional replenishment 
in the transform domain, compared to 150 to 175 kilobits per frame for 
motion-compensated transform coding. 


IV. MOTION COMPENSATION WITH TRANSMITTED DISPLACEMENT 


In this section, we give results of estimating displacement by a 
technique proposed by Limb and Murphy” and then use it for motion- 
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compensated prediction. The displacement computation was done in 
“displacement blocks” with varying sizes, such that the transform 
block was an exact submultiple of the displacement block in both 
dimensions. Also, coded values of intensities of the previous frame 
were used to obviate the need of an additional frame store. Having 
computed the displacement for the displacement block by the Limb/ 
Murphy algorithm, each transform coefficient within the transform 
block is predicted by using the displaced coefficient from the previous 
frame or the nondisplaced coefficient from the previous frame, de- 
pending on which was better for the previous coefficient of the same 
block. In this scheme, there is a tradeoff between the displacement 
block size and the total number of bits required for a given picture 
quality. A large displacement block size tends to average all the local 
variations of the displacement and, consequently, may not result in a 
good prediction; however, it requires less overhead for transmission of 
the displacement estimate. On the other hand, a small displacement 
block requires larger overhead but is potentially superior for displace- 
ment estimation in noiseless data. For real scenes, however, the quality 
of displacement estimation using small blocks might also suffer. 

Our simulations used three sizes for the displacement blocks: 16 Xx 
32 (i.e., 16 lines X 32 elements in the same field), 8 X 16, and 4 x 8. 
These blocks were approximately square, considering the interlace. 
Only a 2 X 4 transform block with the cosine transform was used. All 
the rest of the coder parameters were adjusted to generate pictures of 
approximately the same quality as before. For both the scenes, without 
accounting for bits required for transmission of displacement infor- 
mation, a displacement block size of 8 X 16 did the best in terms of 
bits per frame. For the scene Judy, displacement blocks of 16 x 32 and 
4 X 8 resulted in bit rates that were higher by approximately 1000 bits 
per frame and 2000 bits per frame, respectively. For the scene Mike 
and Nadine, similar comparisons resulted in about 3000 bits per frame 
and 5000 bits per frame. Also without accounting for those bits nec- 
essary for transmission of displacement information, the 8 X 16 dis- 
placement block resulted in bit rates comparable to those of previous 
sections with recursive displacement estimation for Judy, but about 
5000 to 7000 bits per frame higher for the scene Mike and Nadine. 
This, however, is a small percentage of the total bits transmitted per 
frame. As mentioned earlier, the schemes of this section require 
transmission of displacement information. We did not study any 
schemes to optimize transmission of this information. Assuming that 
each D, and D, can be specified by 8 bits, we would need 2024, 8096, 
and 32,384 bits per frame for 16 X 32, 8 X 16, and 4 X 8 displacement 
blocks, respectively. Clearly, considering the overall bit rate, displace- 

“ment blocks of 8 X 16 and 16 X 32 are similar in performance, with a 
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slight preference for the 16 < 32 block. The 4 xX 8 displacement block 
is significantly worse, perhaps due to very noisy displacement esti- 
mates. Comparison of the techniques of this section with that of the 
previous section indicates a slight preference for recursive schemes. 


V. CONCLUSIONS 


We have developed two schemes for motion compensation in the 
transform domain. In the first scheme, displacement is estimated 
recursively from previously transmitted coefficients, whereas the sec- 
ond scheme estimates displacements in a block (generally larger than 
the transform block itself) using some past and future data, and, 
therefore, requires separate transmission of displacement information. 
We found that motion compensation resulted in bit rates which were 
20 to 40 percent better than the conventional hybrid interframe coders, 
which use frame difference prediction. Motion compensation in the pel 
domain was superior to that in the transform domain for the particular 
coder structures we investigated. The two methods of displacement 
estimation were quite similar in performance with a slight preference 
for the scheme with recursive displacement estimation. None of these 
comparisons was based on hardware complexity, and it is possible that 
hardware considerations may change the preferences. 
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Optically Powered Speech Communication 
Over a Fiber Lightguide 


By R. C. MILLER and R. B. LAWRY 
(Manuscript received March 21, 1979) 


1. INTRODUCTION 


The photovoltaic conversion of optical power transmitted over a 
fiber lightguide can supply electrical power to low-drain semiconductor 
devices in remote locations. Acoustic powers comparable to those of 
conventional telephone ringers have been produced’ in this way by 
using a fiber-coupled GaAlAs photovoltaic detector’ to excite an elec- 
troacoustic tone generator. It was conjectured’ that the electrical 
power for other telephone functions—transmit/receive, dialing, and 
hook-status recognition—could also be optically supplied, but the 
signaling techniques appropriate to a dielectric fiber were left unspec- 
ified. This note describes the implementation of two-way speech 
communication between an electrically powered local station and an 
optically powered station located at the remote end of a 1.1-km-long, 
single-strand, optical fiber. The remote-station sound alerter has also 
been operated over this link. 


ll. SPEECH SIGNALING METHOD 


The method used for two-way optically powered speech signaling is 
illustrated in Figs. 1 and 2. Figure 1 depicts schematically the electron- 
ics in an optically powered remote station and the optics in an 
electrically powered local station. Speech-modulated optical power 
was launched into the local station end of the fiber from a GaAs 
injection laser emitting at wavelength \,. The remote station contained 
a fiber-coupled GaAlAs double heterostructure transceiver,’ denoted 
PV/LED in Fig. 1, which functioned as a photovoltaic detector when 


1735 









pr Ser uitr ONE _—" 

bc seetet OPTICAL ~— 

ee By FIBER,“ 

S'=e t2\s 

ast) 

Pyosy facto SS AN aE | 
| REMOTE STATION P’\ \W\hp: | 
| M2 : 
| RINGER | 
Dag Se ee a es ee ee] 


Fig. 1—Schematic of speech signaling method. The fiber length is J. Its transmittance 
is exp(—y,/) for laser light and exp(—y2/) for electroluminescence, in which y,, y2 are loss 
coefficients at the laser and electroluminescence wavelengths. Optical filter F(A) is 
transparent to electroluminescence but attenuates laser light by factor a. Specular 
component of reflection P, is blocked by aperture stop K; diffuse component is atten- 
uated by F(A). Reflection P.” is zero during electroluminescence pulse S’. Symbols am, 
pam, pum refer, respectively, to amplitude modulated, pulse-amplitude modulated, or 
pulse-width modulated voltage or current. Switches H, and Hz are shown in oFF-hook 
state; ON-hook, H; is open and He closed. The 1 V and 2 V values indicated for the 
remote station dc voltages are nominal. 


irradiated by light from the fiber and as an electroluminescent emitter 


when subjected to forward bias. One-volt dc electrical power was 
generated by photovoltaic conversion of the average power arriving 


over the fiber and was used to supply the remote station receive/ 
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transmit circuits. The transceiver detected the speech modulation 
component of optical power, enabling its conversion by these circuits 
to acoustic power at the earphone, and it emitted electroluminescent 
pulses, amplitude-modulated by samples of the microphone output 
voltage. Other authors*”’ have discussed GaAlAs transceivers in which 
optical signals are generated and detected on a time-shared basis, with 
detection typically occurring under zero or reverse bias at very low 
optical powers. However, we are unaware of previous transceiver 
applications that combine injection electroluminescence with photo- 
voltaic power conversion, even though these two phenomena have long 
been known*""° as complementary effects in p-n junctions. 

A form of pulse-width (or pulse-edge-position) modulation was used 
to transmit speech information from the local to the remote station. 
The local station laser was turned off periodically at a rate of 1/T, 
equal at least to the Nyquist sampling rate for the speech bandwidth 
of interest, and was turned back on after a time Ty whose duration 
was modulated over an interval + ATy in correspondence with samples 
of the analog output of the microphone. This type of modulation is 
illustrated in Fig. 2a by the time dependence of the laser power P,(A:) 
reflected from the local station end of the optical fiber. The off-time 
(Tu + AT) was kept small compared to T to maximize the duty factor 
for transmitted power. 

The optical pulses returned from the remote to the local station are 
illustrated in Figs. 2b and 2c. Laser power, reflected from the remote 
fiber-air interface and attenuated to level P,’ (Ai) by the fiber losses, 
arrived back at the local station after pulse round-trip time T,. 
Electroluminescence pulse power, spectrally peaked at wavelength A2, 
was generated by discharging capacitor C2 through transistor switch 
T, with a slight delay Ts from the photovoltaic turn-off instant. A 
portion of this emission, proportional to the square of the fiber nu- 
merical aperture, entered the fiber-guided modes and reached the local 
station after attenuation to level S’(A2). Thus, the time-varying optical 
power incident onto lens L2 consisted of the laser reflections, P,-(A;) 
and P,’(A,), and the very much smaller electroluminescence power, 
S’(A2), modulated over a range AS’ in correlation with the sampled 
audio-frequency voltage on capacitor C,. If this flux, depicted sche- 
matically in Fig. 2d, is allowed to impinge directly onto detector D, the 
output becomes noisy and is sensitive to microphonics. This impair- 
ment was greatly lessened by the dichroic filter F(A) which attenuated 
laser light by a large factor a relative to the longer wavelength 
electroluminescence. The specularly reflected power P,(A1) was atten- 
uated an additional large factor 8 by the aperture stop K. The relative 
levels of these various powers at detector D, with filter F(A) and 
aperture stop K in place, are indicated in Fig. 2e. 
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Fig. 2—Optical power levels in the local station. The laser is on most of the time. It 
is turned off at a rate 1/T equaling at least twice the highest speech frequency of 
interest, and is turned back on after a time Ty. The time Ty is variable over a range 
+ATy (shaded area) in correspondence with the sampled, amplitude-modulated speech 
outgoing from the local station. (a) Laser power P,(A;) reflected from local station end 
of fiber. (b) Reflection P; (A,) returned from remote end of fiber. The lightguide round- 
trip delay time is T,. (c) Transmitted electroluminescence S’(A2). The electrolumines- 
cence pulse is amplitude-modulated over a range AS’. Generation of this pulse is slightly 
delayed (delay-time 7's) with respect to the remote transceiver turn-off time. (d) 
Superposition of P,, P?, and S’ at lens L2. (e) Optical power incident onto photodetector 
D. Reflection P, is reduced by a large factor 8 at aperture stop K. Filter F(A) reduces 
remaining light at wavelength A, by a large factor a relative to wavelength do. 


lil. IMPLEMENTATION 


The local and remote stations of Fig. 1 were linked by a 1.1-km-long, 
fused-silica fiber of 55-~m-diameter core and 0.22 numerical aperture, 
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whose attenuation was 6.3 dB at the 0.825-1m wavelength of the 
double-heterostructure GaAlAs laser, and 5.5 dB for the pv/LED elec- 
troluminescence spectrum peaked at 0.870-m wavelength. The round- 
trip transmission time was T, = 11.5 ys. Lens system L; converged the 
beam through a hole in mirror M to a focus at the fiber, with incidence 
angle such that specularly reflected light missed the mirror hole and 
did not refocus into the laser where it could cause cavity destabiliza- 
tion. Filter F(A) utilized the absorption band edge in p-type, 
Gao 97AloosAs to obtain approximately 3-dB attenuation of electrolu- 
minescence and more than 20 dB attenuation of laser power. 

The circuit branches labeled (a), (b), and (c) in the remote station 
section of Fig. 1 implement, respectively, the dc-powering, receive, and 
transmit functions defined earlier. The L-C filter of branch (a) provided 
ripple-free, 1.0-V power to the bipolar transistors in branches (b) and 
(c). Figure 3 depicts PV/LED voltage waveforms measured at an optical 
power sufficient to produce 1.61 mA of short-circuit current; the 
transceiver zero-volt level is indicated by horizontal arrows. The neg- 
ative voltage excursion during the optical power OFF-interval is caused 
by the L-C filter. The clock period T was chosen equal to the 63.5-ys 
broadcast-television line period in anticipation of future experiments. 
Amplifier A; detected the modulation Ty + ATy corresponding to the 
optical power OFF — ON transition of Fig. 2a and used constant-current 
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Fig. 3—Voltage waveforms at remote station transceiver. Audio-frequency modula- 
tion voltages were present at the local and remote station microphones for trace (a) and 
were absent for trace (b). Short horizontal arrows indicate the zero-voltage baselines. 
(Vertical scale 1 V/cm; horizontal scale, 2 us/cm). 
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charging of a capacitor to convert the pulse-width variations to an 
analog audio-frequency voltage at the earphone. The forward voltage 
needed to produce electroluminescence was obtained by rectifying and 
filtering the output of a multivibrator-chopper voltage upconverter 
(uc). The unrectified chopper output was shaped by an R-L-C circuit 
(PG) into a 1.3-us wide pulse, delayed by T's = 0.2 ps from the optical 
turn-off instant, which discharged capacitor C, through transistor T> 
to produce the electroluminescence pulse. The forward-voltage pulse 
modulation, labeled AV in Fig. 3, corresponds to speech modulation of 
the C, voltage via the crystal microphone and audio amplifier Az. A 
compact tone generator (1.8-in. diameter X 0.5-in. thickness) of ap- 
proximately 35 percent electroacoustic efficiency at 2.0 kHz was incor- 
porated into the remote station telephone base along with the sound- 
alert circuits.’ These circuits were activated by depressing the tele- 
phone hook switch to close contacts H2 and open contacts H;. 


IV. RESULTS AND DISCUSSION 


Satisfactory two-way speech transmission and vigorous sound alert- 
ing at the remote station have been obtained with 14 mW of dc- 
averaged laser power incident onto the fiber. This power is sufficient 
to produce short-circuit currents of 1.45 mA in the PV/LED transceiver 
at the remote end of the 1.1-km fiber. Quantitative studies of audio 
quality in this link have not been made; however, most observers 
describe the speech quality (noise, distortion, frequency response) at 
the remote and local earphones as excellent. 

Noise in the remote earphone is inaudible, and excellent speech 
quality is obtained at the remote station with laser powers smaller 
than 10 mW incident onto the fiber. Speech volume at this earphone 
can easily be made uncomfortably loud, but distortion only appears if 
the local station speech-limiter circuits are maladjusted to permit ATy 
to approach Ty. A low-level, white-noise-type hiss is present in the 
local station earphone under most operating conditions. This noise is 
associated with residual laser radiation present at detector D, and its 
volume can be varied from barely audible to intolerably loud by 
adjusting the position of optical filter F(A). 

The use of a remote station detector with high photovoltaic effi- 
ciency’ is essential to constructing a remotely located telephone all of 
whose power can be delivered from a central office. The optical 
complexity of the remote station is minimized by employing a unitary 
transceiver which takes advantage of the physical kinship between 
photovoltaic conversion and injection electroluminescence. Alternative 
arrangements in which the source and detector functions are per- 
formed in separately optimizable devices do, however, possess advan- 
tages, including the potential for greatly improving the source bright- 
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ness and for simplifying the time-sharing schemes. Several such ar- 
rangements are currently being investigated and will be reported later. 
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